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Preface to the First Edition 


Stochastic processes are ways of quantifying the dynamic relationships of 
sequences of random events. Stochastic models play an important role in 
elucidating many areas of the natural and engineering sciences. They can 
be used to analyze the variability inherent in biological and medical 
processes, to deal with uncertainties affecting managerial decisions and 
with the complexities of psychological and social interactions, and to pro- 
vide new perspectives, methodology, models, and intuition to aid in other 
mathematical and statistical studies. 

This book is intended as a beginning text in stochastic processes for stu- 
dents familiar with elementary probability calculus. Its aim is to bridge 
the gap between basic probability know-how and an intermediate-level 
course in stochastic processes—for example, A First Course in Stochastic 

Processes, by the present authors. 
The objectives of this book are three: (1) to introduce students to the 
standard concepts and methods of stochastic modeling; (2) to illustrate the 
rich diversity of applications of stochastic processes in the sciences; and 
(3) to provide exercises in the application of simple stochastic analysis to 
appropriate problems. 

The chapters are organized around several prototype classes of sto- 
chastic processes featuring Markov chains in discrete and continuous 
time, Poisson processes and renewal theory, the evolution of branching 
events, and queueing models. After the concluding Chapter IX, we pro- 
vide a list of books that incorporate more advanced discussions of several 
of the models set forth in this text. 


Preface to the Third Edition 


The purposes, level, and style of this new edition conform to the tenets set 
forth in the original preface. We continue with our objective of introduc- 
ing some theory and applications of stochastic processes to students hav- 
ing a solid foundation in calculus and in calculus-level probability, but 
who are not conversant with the “epsilon—delta’” definitions of mathemat- 
ical analysis. We hope to entice students towards the deeper study of 
mathematics that is prerequisite to further work in stochastic processes by 
showing the myriad and interesting ways in which stochastic models can 
help us understand the real world. 

We have removed some topics and added others. We added a small sec- 
tion on martingales that includes an example suggesting the martingale 
concept as appropriate for modeling the prices of assets traded in a perfect 
market. A new chapter introduces the Brownian motion process and in- 
cludes several applications of it and its variants in financial modeling. In 
this chapter the Black-Scholes formula for option pricing is evaluated and 
compared with some reported prices of options. A Poisson process whose 
intensity is itself a stochastic process is described in another new section. 

Some treatments have been updated. The law of rare events 1s presented 
via an inequality that measures the accuracy of a Poisson approximation 
for the distribution of the sum of independent, not necessarily identically 
distributed, Bernoulli random variables. We have added the shot noise 
model and related it to a random sum. 

The text contains more than 250 exercises and 350 problems. Exercises 
are elementary drills intended to promote active learning, to develop fa- 
miliarity with concepts through use. They often simply involve the sub- 
stitution of numbers into given formulas, or reasoning one or two steps 
away from a definition. They are the kinds of simple questions that we, as 


instructors, hope that students would pose and answer for themselves as 
they read a text. Answers to the exercises are given at the end of the book 
so that students may gauge their understanding as they go along. 

Problems are more difficult. Some involve extensive algebraic or cal- 
culus manipulation. Many are “word problems” wherein the student is 
asked, in effect, to model some described scenario. As in formulating a 
model, the first step in the solution of a word problem is often a sentence 
of the form “Let x = ....” A manual containing the solutions to the prob- 
lems is available from the publisher. 

A reasonable strategy on the part of the teacher might be to hold stu- 
dents responsible for all of the exercises, but to require submitted solu- 
tions only to selected problems. Every student should attempt a represen- 
tative selection of the problems in order to develop his or her ability to 
carry out stochastic modeling in his or her area of interest. 

A small number of problems are labeled “Computer Challenges.” These 
call for more than pencil and paper for their analyses, and either simula- 
tion, numerical exploration, or symbol manipulation may prove helpful. 
Computer Challenges are meant to be open-ended, intended to explore 
what constitutes an answer in today’s world of computing power. They 
might be appropriate as part of an honors requirement. 

Because our focus is on stochastic modeling, in some instances we have 
omitted a proof and contented ourselves with a precise statement of a 
result and examples of its application. All such omitted proofs may be 
found in A First Course in Stochastic Processes, by the present authors. 
In this more advanced text, the ambitious student will also find additional 
material on martingales, Brownian motion, and renewal processes, and 
presentations of several other classes of stochastic processes. 


To the Instructor 


If possible, we recommend having students skim the first two chapters, re- 
ferring as necessary to the probability review material, and starting the 
course with Chapter III, on Markov chains. A one quarter course adapted 
to the junior—senior level could consist of a cursory (one-week) review of 
Chapters I and II, followed in order by Chapters III through VI. For inter- 
ested students, Chapters VII, VIII, and IX discuss other currently active 
areas of stochastic modeling. Starred sections contain material of a more 
advanced or specialized nature. 
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Chapter | 
Introduction 


1. Stochastic Modeling 


A quantitative description of a natural phenomenon is called a mathe- 
matical model of that phenomenon. Examples abound, from the simple 
equation S = }gt* describing the distance S traveled in time ¢ by a falling 
object starting at rest to a complex computer program that simulates a 
biological population or a large industrial system. 

In the final analysis, a model is judged using a single, quite pragmatic, 
factor, the model’s usefulness. Some models are useful as detailed quanti- 
tative prescriptions of behavior, as for example, an inventory model that 
is used to determine the optimal number of units to stock. Another model 
in a different context may provide only general qualitative information 
about the relationships among and relative importance of several factors 
influencing an event. Such a model is useful in an equally important but 
quite different way. Examples of diverse types of stochastic models are 
spread throughout this book. 

Such often mentioned attributes as realism, elegance, validity, and 
reproducibility are important in evaluating a model only insofar as they 
bear on that model’s ultimate usefulness. For instance, it is both unrealis- 
tic and quite inelegant to view the sprawling city of Los Angeles as a geo- 
metrical point, a mathematical object of no size or dimension. Yet it is 
quite useful to do exactly that when using spherical geometry to derive a 
minimum-distance great circle air route from New York City, another 
“point.” 
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There is no such thing as the best model for a given phenomenon. The 
pragmatic criterion of usefulness often allows the existence of two or 
more models for the same event, but serving distinct purposes. Consider 
light. The wave form model, in which light is viewed as a continuous flow, 
is entirely adequate for designing eyeglass and telescope lenses. In con- 
trast, for understanding the impact of light on the retina of the eye, the 
photon model, which views light as tiny discrete bundles of energy, is 
preferred. Neither model supersedes the other; both are relevant and 
useful. 

The word “stochastic” derives from the Greed (otroyaZeo Oar to aim, to 
guess) and means “random” or “chance.” The antonym is “sure,” “deter- 
ministic,” or “certain.” A deterministic model predicts a single outcome 
from a given set of circumstances. A stochastic model predicts a set of 
possible outcomes weighted by their likelihoods, or probabilities. A coin 
flipped into the air will surely return to earth somewhere. Whether it lands 
heads or tails is random. For a “fair” coin we consider these alternatives 
equally likely and assign to each the probability 3. 

However, phenomena are not in and of themselves inherently stochas- 
tic or deterministic. Rather, to model a phenomenon as stochastic or de- 
terministic is the choice of the observer. The choice depends on the ob- 
server’s purpose; the criterion for judging the choice is usefulness. Most 
often the proper choice is quite clear, but controversial situations do arise. 
If the coin once fallen is quickly covered by a book so that the outcome 
- “heads” or “tails” remains unknown, two participants may still usefully 
employ probability concepts to evaluate what is a fair bet between them; 
that is, they may usefully view the coin as random, even though most peo- 
ple would consider the outcome now to be fixed or deterministic. As a less 
mundane example of the converse situation, changes in the level of a large 
population are often usefully modeled deterministically, in spite of the 
general agreement among observers that many chance events contribute 
to their fluctuations. 

Scientific modeling has three components: (i) a natural phenomenon 
under study, (ii) a logical system for deducing implications about the phe- 
nomenon, and (iii) a connection linking the elements of the natural system 
under study to the logical system used to model it. If we think of these 
three components in terms of the great-circle air route problem, the nat- 
ural system is the earth with airports at Los Angeles and New York; the 
logical system is the mathematical subject of spherical geometry; and the 
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two are connected by viewing the airports in the physical system as points 
in the logical system. 

The modern approach to stochastic modeling is in a similar spirit. Na- 
ture does not dictate a unique definition of “probability,” in the same way 
that there is no nature-imposed definition of “point” in geometry. “Proba- 
bility” and “point” are terms in pure mathematics, defined only through 
the properties invested in them by their respective sets of axioms. (See 
Section 2.8 for a review of axiomatic probability theory.) There are, how- 
ever, three general principles that are often useful in relating or connect- 
ing the abstract elements of mathematical probability theory to a real or 
natural phenomenon that is to be modeled. These are (1) the principle of 
equally likely outcomes, (ii) the principle of long run relative frequency, 
and (iii) the principle of odds making or subjective probabilities. Histori- 
cally, these three concepts arose out of largely unsuccessful attempts to 
define probability in terms of physical experiences. Today, they are rele- 
vant as guidelines for the assignment of probability values in a model, and 
for the interpretation of the conclusions of a model in terms of the phe- 
nomenon under study. 

We illustrate the distinctions between these principles with a long ex- 
periment. We will pretend that we are part of a group of people who de- 
cide to toss a coin and observe the event that the coin will fall heads up. 
This event is denoted by H, and the event of tails, by T. 

Initially, everyone in the group agrees that Pr{H} = 3. When asked why, 
people give two reasons: Upon checking the coin construction, they be- 
lieve that the two possible outcomes, heads and tails, are equally likely; 
and extrapolating from past experience, they also believe that if the coin 
is tossed many times, the fraction of times that heads is observed will be 
close to one-half. 

The equally likely interpretation of probability surfaced in the works of 
Laplace in 1812, where the attempt was made to define the probability of 
an event A as the ratio of the total number of ways that A could occur to 
the total number of possible outcomes of the experiment. The equally 
likely approach is often used today to assign probabilities that reflect some 
notion of a total lack of knowledge about the outcome of a chance phe- 
nomenon. The principle requires judicious application if it is to be useful, 
however. In our coin tossing experiment, for instance, merely introducing 
the possibility that the coin could land on its edge (£) instantly results in 
Pr{H} = Pr{T} = Pr{E} = 3. 
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The next principle, the long run relative frequency interpretation of 
probability, is a basic building block in modern stochastic modeling, made 
precise and justified within the axiomatic structure by the law of large 
numbers. This law asserts that the relative fraction of times in which an 
event occurs in a sequence of independent similar experiments ap- 
proaches, in the limit, the probability of the occurrence of the event on any 
single trial. 

The principle is not relevant in all situations, however. When the sur- 
geon tells a patient that he has an 80-20 chance of survival, the surgeon 
means, most likely, that 80 percent of similar patients facing similar 
surgery will survive it. The patient at hand is not concerned with the long 
run, but in vivid contrast, is vitally concerned only in the outcome of his, 
the next, trial. 

Returning to the group experiment, we will suppose next that the coin is 
flipped into the air and, upon landing, is quickly covered so that no one can 
see the outcome. What is Pr{H} now? Several in the group argue that the 
outcome of the coin is no longer random, that Pr{H} 1s either 0 or 1, and 
that although we don’t know which it is, probability theory does not apply. 

Others articulate a different view, that the distinction between “ran- 
dom” and “lack of knowledge” is fuzzy, at best, and that a person with a 
sufficiently large computer and sufficient information about such factors 
as the energy, velocity, and direction used in tossing the coin could have 
predicted the outcome, heads or tails, with certainty before the toss. 
Therefore, even before the coin was flipped, the problem was a lack of 
knowledge and not some inherent randomness in the experiment. 

In a related approach, several people in the group are willing to bet with 
each other, at even odds, on the outcome of the toss. That is, they are will- 
ing to use the calculus of probability to determine what is a fair bet, with- 
out considering whether the event under study is random or not. The use- 
fulness criterion for judging a model has appeared. 

While the rest of the mob were debating “random” versus “lack of 
knowledge,” one member, Karen, looked at the coin. Her probability for 
heads is now different from that of everyone else. Keeping the coin cov- 
ered, she announces the outcome “Tails,” whereupon everyone mentally 
assigns the value Pr{H} = 0. But then her companion, Mary, speaks up 
and says that Karen has a history of prevarication. 

The last scenario explains why there are horse races; different people 
assign different probabilities to the same event. For this reason, probabil- 
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ities used in odds making are often called subjective probabilities. Then, 
odds making forms the third principle for assigning probability values in 
models and for interpreting them in the real world. 

The modern approach to stochastic modeling 1s to divorce the definition 
of probability from any particular type of application. Probability theory 
is an axiomatic structure (see Section 2.8), a part of pure mathematics. Its 
use in modeling stochastic phenomena 1s part of the broader realm of sci- 
ence and parallels the use of other branches of mathematics in modeling 
deterministic phenomena. 

To be useful, a stochastic model must reflect all those aspects of the 
phenomenon under study that are relevant to the question at hand. In ad- 
dition, the model must be amenable to calculation and must allow the de- 
duction of important predictions or implications about the phenomenon. 


1.1. Stochastic Processes 


A stochastic process is a family of random variables X,, where t is a para- 
meter running over a suitable index set T. (Where convenient, we will 
write X(t) instead of X,.) In a common situation, the index ¢ corresponds 
to discrete units of time, and the index set is T= {0, 1, 2,...}. In this 
case, X, might represent the outcomes at successive tosses of a coin, re- 
peated responses of a subject in a learning experiment, or successive ob- 
servations of some characteristics of a certain population. Stochastic 
processes for which T = [0, ©) are particularly important in applications. 
Here t often represents time, but different situations also frequently arise. 
For example, t may represent distance from an arbitrary origin, and X, may 
count the number of defects in the interval (0, t] along a thread, or the 
number of cars in the interval (0, ¢] along a highway. 

Stochastic processes are distinguished by their state space, or the range 
of possible values for the random variables X,, by their index set 7, and by 
the dependence relations among the random variables X,. The most widely 
used classes of stochastic processes are systematically and thoroughly 
presented for study in the following chapters, along with the mathemati- 
cal techniques for calculation and analysis that are most useful with these 
processes. The use of these processes as models is taught by example. 
Sample applications from many and diverse areas of interest are an inte- 
gral part of the exposition. 
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This section summarizes the necessary background material and estab- 
lishes the book’s terminology and notation. It also illustrates the level of 
the exposition in the following chapters. Readers who find the major part 
of this section’s material to be familiar and easily understood should have 
no difficulty with what follows. Others might wish to review their proba- 
bility background before continuing. 

In this section statements frequently are made without proof. The 
reader desiring justification should consult any elementary probability 
text as the need arises. 


2.1. Events and Probabilities 


The reader is assumed to be familiar with the intuitive concept of an event. 
(Events are defined rigorously in Section 2.8, which reviews the axiomatic 
structure of probability theory.) 

Let A and B be events. The event that at least one of A or B occurs 1s 
called the union of A and B and is written A U B; the event that both occur 
is called the intersection of A and B and is written A M B, or simply AB. 
This notation extends to finite and countable sequences of events. Given 
events A,, A,,..., the event that at least one occurs is written A, U A, U 
166 = Uist A,, the event that all occur is written A, NA, N-- + = QNj=1 A;. 

The probability of an event A is written Pr{A}. The certain event, de- 
noted by 2), always occurs, and Pr{{2} = 1. The impossible event, de- 
noted by @, never occurs, and Pr{@} = 0. It is always the case that 0 = 
Pr{A} = 1 for any event A. 

Events A, B are said to be disjoint if AM B = ©; that is, if A and B 
cannot both occur. For disjoint events A, B we have the addition law 
Pr{A U B} = Pr{A} + Pr{B}. A stronger form of the addition law is as 
follows: Let A,, A,, ... be events with A; and A; disjoint whenever i # j. 
Then Pr{Uj=1 A;} = i=: Pr{A;}. The addition law leads directly to the law 


* Many readers will prefer to omit this review and move directly to Chapter III, on 
Markov chains. They can then refer to the background material that is summarized in the 
remainder of this chapter and in Chapter II only as needed. 
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of total probability: Let A,, A, . . . be disjoint events for which Q. = A, U A, 
U.... Equivalently, exactly one of the events A,, A, . . . will occur. The law 
of total probability asserts that Pr{B} = Si-, Pr{B  A;} for any event B. 
The law enables the calculation of the probability of an event B from 
the sometimes more easily determined probabilities Pr{B M A;}, where i 
= 1,2,....Judicious choice of the events A; is prerequisite to the prof- 
itable application of the law. 
Events A and B are said to be independent if Pr{A M B} = Pr{A} X 
Pr{B}. Events A,, A,,... are independent if 
Pr{A; NA, 1--- OA, } = Pr{A; } Pr{A.} - > PrtA; } 


for every finite set of distinct indices i), i,,..., 1,. 


2.2 Random Variables 


An old-fashioned but very useful and highly intuitive definition describes 
a random variable as a variable that takes on its values by chance. In Sec- 
tion 2.8, we sketch the modern axiomatic structure for probability theory 
and random variables. The older definition just given serves quite ade- 
quately, however, in virtually all instances of stochastic modeling. Indeed, 
this older definition was the only approach available for well over a cen- 
tury of meaningful progress in probability theory and_ stochastic 
processes. 

Most of the time we adhere to the convention of using capital letters 
such as X, Y, Z to denote random variables, and lowercase letters such as 
x, y, z for real numbers. The expression {X = x} is the event that the ran- 
dom variable X assumes a value that is less than or equal to the real num- 
ber x. This event may or may not occur, depending on the outcome of the 
experiment or phenomenon that determines the value for the random vari- 
able X. The probability that the event occurs is written Pr{X = x}. Allow- 
ing x to vary, this probability defines a function 


F(x) = Pr{X S x}, —~w<iX< +H, 


called the distribution function of the random variable X. Where several 
random variables appear in the same context, we may choose to distin- 
guish their distribution functions with subscripts, writing, for example, 
F,(€) = Pr{X = €} and F,(é) = Pr{Y = &}, defining the distribution 
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functions of the random variables X and Y, respectively, as functions of 
the real variable €. 

The distribution function contains all the information available about a 
random variable before its value is determined by experiment. We have, 
for instance, Pr{X > a} = 1 — F(a), Pr{a < X = b} = F(b) — F(a), and 
Pr{X = x} = F(x) — lim, ,, F(x — €) = F(X) — F(x --). 

Arandom variable X is called discrete if there 1s a finite or denumerable 
set of distinct values x,, x,,... such that a, = Pr{X = x;} > 0 fori = 1, 2, 
... and 2a; = 1. The function 


p(x) = py(%) = a; fori=1,2,... (2.1) 


is called the probability mass function for the random variable X and 1s re- 
lated to the distribution function via 


P(x;) = F(x) — FQ, —) and F(x) = >) p(x). 


<= 
Ajax 


The distribution function for a discrete random variable is a step function, 
which increases only in jumps, the size of the jump at x; being p(x). 

If Pr{X = x} = 0 for every value of x, then the random variable X 
is called continuous and its distribution function F(x) is a continuous 
function of x. If there is a nonnegative function f(x) = f,(x) defined for 
—oo < x < © such that 


b 
Pra<X<b} =| f(x) dx for -~™<a<b<o, (2.2) 


then f(x) is called the probability density function for the random variable 
X. If X has a probability density function f(x), then X is continuous and 


Fo) =| flQd&  -0<x<e 


If F(x) is differentiable in x, then X has a probability density function 
given by 


f(x) = = F(x) = F'(x), —o <x <0, (2.3) 


In differential form, (2.3) leads to the informal statement 


Pr{x << X Sx + dx} = F(x + dx) — F(x) = dF(X) = f(x) dx. (2.4) 
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We consider (2.4) to be a shorthand version of the more precise statement 
Prix<XSx+ Ax} =f(x) Ax+o(Ax), AxtlO0, = (2.5) 


where o(Ax) is a generic remainder term of order less than Ax as Ax J 0. 
That is, o(Ax) represents any term for which lim,,;, o(Ax)/Ax = 0. By the 
fundamental theorem of calculus, equation (2.5) is valid whenever the 
probability density function is continuous at x. 

While examples are known of continuous random variables that do not 
possess probability density functions, they do not arise in stochastic models 
of common natural phenomena. 


2.3. Moments and Expected Values 


If X is a discrete random variable, then its mth moment is given by 
E[X"] = > x;" Pr{X = x;}, (2.6) 


[where the x; are specified in (2.1)] provided that the infinite sum con- 
verges absolutely. Where the infinite sum diverges, the moment is said not 
to exist. If X is a continuous random variable with probability density 
function f(x), then its mth moment is given by 


E[X"] = | x"f(x) dx, (2.7) 
provided that this integral converges absolutely. 

The first moment, corresponding to m = 1, is commonly called the 
mean or expected value of X and written m, or pty. The mth central mo- 
ment of X is defined as the mth moment of the random variable X — py, 
provided that jy, exists. The first central moment is zero. The second cen- 
tral moment is called the variance of X and written o; or Var[X]. We have 
the equivalent formulas Var[X] = E[(X — )’] = E[X?] — p’. 

The median of a random variable X is any value v with the property that 


Pr{X =v} =} and Pr{X Sv} =}. 


If X is a random variable and g is a function, then Y = g(X) is also a 
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random variable. If X is a discrete random variable with possible values 
X,, X>,..., then the expectation of g(X) is given by 


Elg(X)] = > g(x,) Pr{X = x}, (2.8) 


provided that the sum converges absolutely. If X is continuous and has the 
probability density function f,, then the expected value of g(X) is evalu- 
ated from 


Elg(X)) = | g(x)feo) de. (2.9) 


The general formula, covering both the discrete and continuous cases, is 


E{g(X)] = | g(x) dF (x), (2.10) 


where F, is the distribution function of the random variable X. Technically 
speaking, the integral in (2.10) is a Lebesgue-—Stieltjes integral. We do not 
require knowledge of such integrals in this text, but interpret (2.10) to sig- 
nify (2.8) when X is a discrete random variable, and to represent (2.9) 
when X possesses a probability density f,. 

Let Fy) = Pr{Y = y} denote the distribution function for Y = g(X). 
When X is a discrete random variable, then 


EY] = Dy, Pri¥ = y)} 


= > 8%) Pr{X = x;} 


if y, = g(x,) and provided that the second sum converges absolutely. In 
general, 


E[Y] = | y dF,(y) 
(2.11) 
= | (x) dF,(x). 


If X is a discrete random variable, then so is Y = g(X). It may be, how- 
ever, that X is a continuous random variable while Y is discrete (the reader 
should provide an example). Even so, one may compute E[Y] from either 
form in (2.11) with the same result. 
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2.4. Joint Distribution Functions 


Given a pair (X, Y) of random variables, their joint distribution function 
is the function F,y of two real variables given by 


Fyy(x, y) = F(x, y) = Pr{X Sx and Ys y}. 


Usually, the subscripts X, Y will be omitted, unless ambiguity is possible. 
A joint distribution function Fy, is said to possess a (joint) probability den- 
sity if there exists a function f,, of two real variables for which 


Fu y)= | | fl& mdndé for alll x, y. 


The function F,(x) = lim,_,.. F(, y) is a distribution function, called the 
marginal distribution function of X. Similarly, F,(y) = lim,_,.. F(x, y) is 
the marginal distribution function of Y. If the distribution function F pos- 
sesses the joint density function f, then the marginal density functions for 
X and Y are given, respectively, by 


fx) = | f(x, y)dy and f(y) = | f(x, y) dx. 


If X and Y are jointly distributed, then E[X + Y] = E[X] + E[Y], pro- 
vided only that all these moments exist. 


Independence 


If it happens that F(x, y) = F,(x) X F,(y) for every choice of x, y, then the 
random variables X and Y are said to be independent. If X and Y are inde- 
pendent and possess a joint density function f(x, y), then necessarily 
F(x, y) = fOOF,(y) for all x, y. 

Given jointly distributed random variables X and Y having means py 
and py and finite variances, the covariance of X and Y, written oy, or 
Cov[X, Y], 1s the product moment oy, = E[(X — p,(Y — py] = 
E[XY] — pypy, and X and Y are said to be uncorrelated if their covariance 
is Zero, that is, Oyy = 0. Independent random variables having finite vari- 
ances are uncorrelated, but the converse is not true; there are uncorrelated 
random variables that are not independent. 
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Dividing the covariance ox, by the standard deviations o, and oy, de- 
fines the correlation coefficient p = Oyy/o,oy, for which -1 S ps +1. 

The joint distribution function of any finite collection X,, ... , X,, of ran- 
dom variables is defined as the function 


F(x,, 9 X,) = Fy aeeee x (X15 ? Xn) 
= Pr{X, =x,,...,X, =X,}. 
If F(x,,...,%,) = Fy,(x,)- ++ Fy (,) for all values of x,,..., X,, then the 
random variables X,,..., X,, are said to be independent. 
A joint distribution function F(x,, ..., x,) 18 said to have a probability 
density function f(é,,..., &,) if 
x| Xn 


Fon m= foo | fGne--s Gab db, 


—x 


for all values of x,,..., x 


ne 


Expectation 


For jointly distributed random variables X,,..., X, and arbitrary func- 
tions h,,...,h,, of n variables each, 


E|> h(X,,.--, x) => E[h(X),...,X,)], 


provided only that all these moments exist. 


2.5. Sums and Convolutions 


If X and Y are independent random variables having distribution functions 
F, and Fy, respectively, then the distribution function of their sum Z = X 
+ Y is the convolution of F, and Fy: 


x ea 


F@= | R@-©4F(®= | F(e-n)dF(m). (2.12) 


—_ _-=x 


If we specialize to the situation where X and Y have the probability densi- 
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ties f, and f;, respectively, then the density function f; of the sum Z = X + 
Y is the convolution of the densities f, and fy: 


fl) = | fe- miman= | fe- Of ag (2.13) 


Where X and Y are nonnegative random variables, the range of integration 
is correspondingly reduced to 


fr) = | FZ — Mfn) dn = | fz — HFG) dE = forz=O. (2.14) 
0 0 


If X and Y are independent and have respective variances oj and a}, 
then the variance of the sum Z = X + Y is the sum of the variances: a3 = 
o; + ao}. More generally, if X,, ..., X,, are independent random variables 
having variances g?, ..., 07, respectively, then the variance of the sum 
Z=X,+...+X,isoeJ =o? +...+ 07. 


2.6. Change of Variable 


Suppose that X is a random variable with probability density function f, 
and that g is a strictly increasing differentiable function. Then Y = g(X) 
defines a random variable, and the event {Y = y} is the same as the event 
{X = g7'(y)}, where g™' is the inverse function to g; 1.e., y = g(x) if and 
only if x = g~'(y). Thus we obtain the correspondence F{y) = Pr{Y = y} 
= Pr{X = 2° '(y)} = F,(g7'(y)) between the distribution function of Y and 
that of X. Recall the differential calculus formula 


dg l _ l 
dy g'(x)  dgldx’ 


and use this in the chain rule of differentiation to obtain 


where y = g(x), 


dF dF (27! l 
fry) = BAY) _ Ekg) f<x) ——, where y = g(x). 
dy dy g (x) 
The formula 
1 
fry) = ~~ Ff), where y = g(x). (2.15) 
g (x) 


expresses the density function for Y in terms of the density for X when g 
is strictly increasing and differentiable. 
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For any events A and B, the conditional probability of A given B is writ- 
ten Pr{A|B} and defined by 
Pr{A M B} 
Pr{AlIB} = ——— f Pr{B} > 0, 2.1 
r{A|B} Pri BI if Pr{B} (2.16) 
and is left undefined if Pr{B} = 0. [When Pr{B} = 0, the right side of 
(2.16) is the indeterminate quantity 5.] 
In stochastic modeling, conditional probabilities are rarely procured via 
(2.16) but instead are dictated as primary data by the circumstances of the 
application, and then (2.16) is applied in its equivalent multiplicative form 


Pr{A M B} = Pr{A|B} Pr{B} (2.17) 


to compute other probabilities. (An example follows shortly.) Central in 
this role is the law of total probability, which results from substituting 
Pr{A M B,} = Pr{A|B,} Pr{B,} into Pr{A} = 2%, Pr{A N B,}, where O = 
B, UB,U---andB,N B = Gif i # j (cf. Section 2.1), to yield 


Pr{A} = S Pr{A\B;} Pr{B;}. (2.18) 


Example Gold and silver coins are allocated among three urns labeled 
I, II, III according to the following table: 


Number of Number of 
Urn Gold Coins Silver Coins 


I 4 8 
I] 3 9 
Il 6 6 


An urn is selected at random, all urns being equally likely, and then a coin 
is selected at random from that urn. Using the notation I, II, III for the 
events of selecting urns I, II, and III, respectively, and G for the event of 
selecting a gold coin, then the problem description provides the following 
probabilities and conditional probabilities as data: 
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Pr{I}=!, Pr{(Gi} =4, 
Pr{ll} =!,  Pr{GIll} =3, 
Pr{Il} =!, Pr{GIlll} = §, 


and we calculate the probability of selecting a gold coin according to 
(2.18), viz. 


Pr{G} = Pr{GII} Pr{I} + Pr{Gill} Pr{I1} + Pr{ GI} Pr{1} 
= 80) + iG) + i@ = % 


As seen here, more often than not conditional probabilities are given as 
data and are not the end result of calculation. 

Discussion of conditional distributions and conditional expectation 
merits an entire chapter (Chapter II). 


2.8. Review of Axiomatic Probability Theory” 


For the most part, this book studies random variables only through their 
distributions. In this spirit, we defined a random variable as a variable that 
takes on its values by chance. For some purposes, however, a little more 
precision and structure are needed. 

Recall that the basic elements of probability theory are 


1. the sample space, a set (2 whose elements w correspond to the pos- 
sible outcomes of an experiment; 

2. the family of events, a collection ¥ of subsets A of 1: we say that 
the event A occurs if the outcome w of the experiment is an element 


of A; and 
3. the probability measure, a function P defined on & and satisfying 
(a) 0 = P[O] s P[A] Ss P[Q] = 1 forA © # 


(O = the empty set) 


* The material included in this review of axiomatic probability theory 1s not used in the 
remainder of the book. It is included in this review chapter only for the sake of complete- 
ness. 
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and 


(b) PIU ,] => PIA, (2.19) 


n=] 
if the events A,, A,, .. . are disjoint; i-e., if A, MA; = © when i # j. 
The triple (Q, ¥, P) is called a probability space. 


Example When there are only a denumerable number of possible out- 
comes, say (2 = {w,, @,...}, we may take ¥ to be the collection of all 


subsets of (1. If p,, p., ... are nonnegative numbers with 2, p, = 1, the 
assignment 
PIA])=)> p. 
w,CA 


determines a probability measure defined on ¥. 

It is not always desirable, consistent, or feasible to take the family of 
events as the collection of all subsets of 0. Indeed, when (©) is nondenu- 
merably infinite, it may not be possible to define a probability measure on 
the collection of all subsets maintaining the properties of (2.19). In what- 
ever way we prescribe ¥ such that (2.19) holds, the family of events ¥ 
should satisfy 


(a) Dis in ¥ and / is in F; 

(b) A‘ is in # whenever A is in ¥, where AS = {w € (1); w E A} (2.20) 
is the complement of A; and 

(c) U%., A, is in # whenever A, is in ¥ form = 1,2,.... 


A collection ¥ of subsets of a set (2) satisfying (2.20) is called a o-alge- 
bra. If ¥ is a o-algebra, then 


MA, =(U As) 
n=1 


is in ¥ whenever A, is in ¥ for n = 1, 2,.... Manifestly, as a conse- 
quence we find that finite unions and finite intersections of members of ¥ 
are maintained in #. 

In this framework, a real random variable X is a real-valued function 
defined on 1) fulfilling certain “measurability” conditions given here. The 
distribution function of the random variable X is formally given by 
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Pr{a<X=b} =Pl[{w;a < X(w) = D}). (2.21) 


In words, the probability that the random variable X takes a value in (a, b] 
is calculated as the probability of the set of outcomes w for which a < 
X(w) = b. If relation (2.21) is to have meaning, X cannot be an arbitrary 
function on 2), but must satisfy the condition that 


{w;a < X(w) = b} is in F for all real a < b, 


since # embodies the only sets A for which P[A] is defined. In fact, by ex- 
ploiting the properties (2.20) of the o-algebra %, we find that it is enough 
to require 


{w; X(w) = x} is in F for all real x. 


Let 4 be any o-algebra of subsets of (1. We say that X is measurable with 
respect to A, or more briefly 4-measurable, if 


{w; X(w) = x} is in S for all real x. 


Thus, every real random variable is by definition #-measurable. There 
may, in general, be smaller o-algebras with respect to which X is also 
measurable. 

The o-algebra generated by a random variable X is defined to be the 
smallest o-algebra with respect to which X is measurable. It is denoted by 
4#(X) and consists exactly of those sets A that are in every o-algebra 4 for 
which X is 4-measurable. For example, if X has only denumerably many 
possible values x,, x,,..., the sets 


A; = {o; X(w) = x;}, i=1,2,..., 
form a countable partition of , i.e., 
2 = UA, 
and - 
A;N A; = © if i # j, 
and then #(X) includes precisely @, 0), and every set that is the union of 


some of the A,’s. 


Example For the reader completely unfamiliar with this framework, 
the following simple example will help illustrate the concepts. The exper- 
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iment consists in tossing a nickel and a dime and observing “heads” or 
“tails.” We take 2. to be 


0 = {(H, A), (H, T), (T, A), (T, T)}, 


where, for example, (H, T) stands for the outcome “nickel = heads, and 
dime = tails.” We will take the collection of all subsets of 1 as the fam- 
ily of events. Assuming each outcome in 22 to be equally likely, we arrive 
at the probability measure: 


AEG P[A] AEG P[A] 

Q 0 co) l 
{(H, H)} {(H,T),(T,H),(7,T)} 
{(H, T)} 5 {(H, H),(T,H),(T,T)} 
{(T, H)} 4 {(H,H),(H,T),(7,T)} 3 
{(T, T)} ‘ {(H, H),(H,T),(T,H)} 3 
{(H, H), (A, T)} 2 {((T, H), (7, T)} ; 
{(H, A), (T, H)} 3 {(H, T), (T, T)} 2 
{(H, H), (T, T)} ; {(H, T), (T, H)} 3 


The event “nickel is heads” is {(H, H), (H, T)} and has, according to the 
table, probability 5, as it should. 

Let X, be 1 if the nickel is heads, and 0 otherwise; let X, be the corre- 
sponding random variable for the dime; and let Z = X, + X, be the total 
number of heads. As functions on (1, we have 


wEQ X,(w) X ,(w) Z(w) 


(H, 1) l 2 
(H, T) 0 
(T, H) 0 l 
(T, T) 0 0 0 


Finally, the o-algebras generated by X,, and Z are 
#(X,,) = O, O, (CH, A), (A, T)}, (7, A), (7, T)}, 
and 
F(Z) = 0, O, {(H, H)}, (CH, T), (T, ED}, UT, TY}, 
{(H, T), (T, H), (7, T)}, (CH, A), (7, T)}, 
{(H, 1), (H, T), (T, H)}. 
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F(X) contains four sets and #(Z) contains eight. Is X,, measurable with 
respect to ¥(Z), or vice versa? 

Every pair X, Y of random variables determines a o-algebra called the 
o-algebra generated by X, Y. It is the smallest o-algebra with respect to 
which both X and Y are measurable. This o-algebra comprises exactly 
those sets A that are in every o-algebra 4 for which X and Y are both 4- 
measurable. If both X and Y assume only denumerably many possible val- 
ues, Say X,,X,,... and y,, y.,..., respectively, then the sets 


Aj; = {w; X(w) = x;, Y(w) = y,}, Lj= I, dy sees 


present a countable partition of 0, and ¥(X, Y) consists precisely of ©, 
{), and every set that is the union of some of the A,,’s. Observe that X is 
measurable with respect to ¥(X, Y), and thus ¥#(X) C F(X, Y). 

More generally, let {X(t); t © T} be any family of random variables. 
Then the o-algebra generated by {X(t); t © T} is the smallest o-algebra 
with respect to which every random variable X(t), t © 7, is measurable. It 
is denoted by #{X(t); t € T}. 

A special role is played by a distinguished o-algebra of sets of real 
numbers. The o-algebra of Borel sets is the o-algebra generated by the 
identity function f(x) = x, for x € (—%, ©). Alternatively, the o-algebra 
of Borel sets is the smallest o-algebra containing every interval of the 
form (a, b], -~e Sa =b< +, A real-valued function of a real variable 
is said to be Borel measurable if it is measurable with respect to the o-al- 
gebra of Borel sets. 


Exercises 


2.1. Let A and B be arbitrary, not necessarily disjoint, events. Use the 
law of total probability to verify the formula 


Pr{A} = Pr{AB} + Pr{AB‘}, 


where B* is the complementary event to B. (That is, B‘ occurs if and only 
if B does not occur.) 
2.2. Let A and B be arbitrary, not necessarily disjoint, events. Establish 


the general addition law 


Pr{A U B} = Pr{A} + Pr{B} — Pr{AB}. 
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Hint: Apply the result of Exercise 2.1 to evaluate Pr{AB‘} = 
Pr{A} — Pr{AB}. Then apply the addition law to the disjoint events AB 
and AB‘, noting that A = (AB) U (AB*). 


2.3. 
(a) Plot the distribution function 


0 forx = 0, 
ry = | forO0<x<1, 
l for x = 1. 
(b) Determine the corresponding density function f(x) in the three re- 
gions (1) x = 0, (11) 0 < x < 1, and (iii) 1 Sx. 
(c) What is the mean of the distribution? _ 
(d) If X is arandom variable following the distribution specified in (a), 
evaluate Pr{i = X = 3}. 


2.4. Let Z be a discrete random variable having possible values 0, 1, 2, 
and 3 and probability mass function 


p(O)=3, ~=——p(2) = 3, 
PQ) =3,  ~p(3) =3. 


(a) Plot the corresponding distribution function. 
(b) Determine the mean E[Z]. 
(c) Evaluate the variance Var[Z]. 


2.5. Let A, B, and C be arbitrary events. Establish the addition law 
Pr{A UB UC} = Pr{A} + Pr{B} + Pr{C} — Pr{AB} 
— Pr{AC} — Pr{BC} + Pr{ABC}. 


2.6. Let X and Y be independent random variables having distribution 
functions F, and F), respectively. 


(a) Define Z = max{X, Y} to be the larger of the two. Show that 
F(z) = F,(z)F (z) for all z. 

(b) Define W = min{X, Y} to be the smaller of the two. Show that 
Fy(w) = 1 — [1 -— F,(w)][1 — F(w)] for all w. 
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2.7. Suppose X is a random variable having the probability density 
function 


Rx®"! forO=x= 1, 
0 elsewhere, 


fix) = | 


where R > 0 is a fixed parameter. 


(a) Determine the distribution function F,(x). 
(b) Determine the mean E[X]. 
(c) Determine the variance Var[X ]. 


2.8. Arandom variable V has the distribution function 


0 forv <0, 
F(v) = 41 -—(1 — vy forO =v = 1, 
1 forv > 1, 


where A > 0 is a parameter. Determine the density function, mean, and 
variance. 


2.9. Determine the distribution function, mean, and variance corre- 
sponding to the triangular density. 


x forO=x<=1, 
f~)=32-x forl =x =2, 
0 elsewhere. 


2.10. Let 1{A} be the indicator random variable associated with an 
event A, defined to be one if A occurs, and zero otherwise. Define A‘, the 
complement of event A, to be the event that occurs when A does not occur. 
Show 


(a) 1{A‘} = 1 — 1{A} 
(b) 1{A M B} = 1{A}1{B} = min{1{A}, 1{B}} 
(c) 1{A U B} = max{1{A}, 1{B}}. 
Problems 
2.1. Thirteen cards numbered 1,..., 13 are shuffled and dealt one at a 


time. Say a match occurs on deal k if the kth card revealed is card number 
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k. Let N be the total number of matches that occur in the thirteen cards. 
Determine E[N]. 


Hint: Write N = 1{A,} +--- + 1{A,,} where A, is the event that a 
match occurs on deal k. 


2.2. Let N cards carry the distinct numbers x,, ... , x,. If two cards are 
drawn at random without replacement, show that the correlation coeffi- 
cient p between the numbers appearing on the two cards is —1/(N — 1). 


2.3. A population having N distinct elements is sampled with replace- 
ment. Because of repetitions, a random sample of size r may contain 
fewer than r distinct elements. Let S, be the sample size necessary to get 
r distinct elements. Show that 


] ] ] 
= — + ———_ + --- + —— . 
AIS] May N-1 yore) 
2.4. A fair coin is tossed until the first time that the same side appears 
twice in succession. Let N be the number of tosses required. 


(a) Determine the probability mass function for N. 
(b) Let A be the event that N is even and B be the event that N = 6. 
Evaluate Pr{A}, Pr{B}, and Pr{AB}. 


2.5. Two players, A and B, take turns on a gambling machine until one 
of them scores a success, the first to do so being the winner. Their proba- 
bilities for success on a single play are p for A and g for B, and successive 
plays are independent. 


(a) Determine the probability that A wins the contest given that A plays 
first. . 
(b) Determine the mean number of plays required, given that A wins. 


2.6. A pair of dice is tossed. If the two outcomes are equal, the dice are 
tossed again, and the process repeated. If the dice are unequal, their sum 
is recorded. Determine the probability mass function for the sum. 


2./. Let U and W be jointly distributed random variables. Show that U 
and W are independent if 


Pr{U > uand W> w} = Pr{U > u} Pr{W> w} for all u, w. 
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2.8. Suppose X is a random variable with finite mean yw and variance o”, 
and Y = a + bX for certain constants a, b # 0. Determine the mean and 
variance for Y. 


2.9. Determine the mean and variance for the probability mass function 


2(n —k 
nb) = 


= fork =1,2,...,n. 
n(n — 1) 


2.10. Random variables X and Y are independent and have the proba- 
bility mass functions 


px0) =3, py) =%, 
Px3) = 3, py(2) = 3, 
Py(3) = 3. 


Determine the probability mass function of the sum Z = X + Y. 


2.11. Random variables U and V are independent and have the proba- 
bility mass functions 


PO) =3, py) =3 
Py) = %» P2) = ‘. 
P(2) = - 


Determine the probability mass function of the sum W = U + V. 


2.12. Let U, V, and W be independent random variables with equal vari- 
ances o°. Define X = U + W and Y = V — W. Find the covariance be- 
tween X and Y. 


2.13. Let X and Y be independent random variables each with the uni- 
form probability density function 


forO<x< 1, 
elsewhere. 


fx) = R 


Find the joint probability density function of U and V, where U = 
max{X, Y} and V = min{X, Y}. 
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3. The Major Discrete Distributions 


The most important discrete probability distributions and their relevant 
properties are summarized in this section. The exposition is brief, since 
most readers will be familiar with this material from an earlier course in 
probability. 


3.1. Bernoulli Distribution 


A random variable X following the Bernoulli distribution with parameter 
p has only two possible values, 0 and 1, and the probability mass function 
is p(1) = p and p(O) = 1 — p, where 0 < p < 1. The mean and variance 
are E[X] = p and Var[X] = p(1 — p), respectively. 

Bernoulli random variables occur frequently as indicators of events. 
The indicator of an event A is the random variable 


] if A occurs, 
0 if A does not occur. 


1(A) = 1, = | (3.1) 


Then 1, is a Bernoulli random variable with parameter p = E[1,] = 
Pr{A}. 
The simple expedient of using indicators often reduces formidable cal- 


culations into trivial ones. For example, let a,, a2, ..., a, be arbitrary real 
numbers and A,, A,,..., A, be events, and consider the problem of show- 
ing that 


oh 


> de aja, Pr{A, A,} = 0. (3.2) 


Attacked directly, the problem is difficult. But bringing in the indicators 
1(A,) and observing that 


O< {> ala)} = {> at(ay}{ > a 1(A)t 


= > > a,a1(AUA) = SS. aa, NA) 


= i=1j=1 
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gives, after taking expectations, 


O< A> aaa} | =S YS aa,E(U(A; 0 A)] 


i=1 j=l 


and the demonstration of (3.2) is complete. 


3.2. Binomial Distribution 


Consider independent events A,, A,,...,A,, all having the same proba- 
bility p = Pr{A;} of occurrence. Let Y count the total number of events 
among A,,...,A, that occur. Then Y has a binomial distribution with pa- 
rameters n and p. The probability mass function is 
p(k) = Pr{ Y = k} 
nl (3.3) 
=—- OO CO—*Dd'Kk _ n-k _ 
ka pie Pp) fork =0,1,...,7. 


Writing Y as a sum of indicators in the form Y = 1(A,) +--- + 1(A,) 
makes it easy to determine the moments 


E[Y] = E{W(A,)] + --- + E[L(A,)] = np, 
and using independence, we can also determine that 
Var[Y] = Var[1(A,)] +--+ + Var[1(A,)] = np(1 — p). 


Briefly, we think of a binomial random variable as counting the number 
of “successes” in n independent trials where there is a constant probabil- 
ity p of success on any single trial. 


3.3. Geometric and Negative Binomial Distributions 


Let A,, A,, ... be independent events having a common probability p = 
Pr{A;} of occurrence. Say that trial k is a success (S) or failure (F) 
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according as A, occurs or not, and let Z count the number of failures prior 
to the first success. To be precise, Z = k if and only if 1(A,) = 0,..., 1(A,) 
= QO, and 1(A,,,) = 1. Then Z has a geometric distribution with parameter 
p. The probability mass function is 


p(k) = p( — p) fork =0,1,..., (3.4) 


and the first two moments are 


on 1 - 
E(Z]}=——>; —-Var[z] = ——©. 
D 2 
Sometimes the term “geometric distribution” is used in referring to the 
probability mass function 


Pz(k) = p(X — py! fork =1,2,.... (3.5) 


This is merely the distribution of the random variable Z’ = 1 + Z, the 
number of trials until the first success. Hence E[Z’] = 1 + E[Z] = 1/p, 
and Var[Z’] = Var[Z] = (1 — p)/p’. 

Now fix an integer r = 1 and let W, count the number of failures ob- 
served before the rth success in A,, A>, .... Then W., has a negative bi- 
nominal distribution with parameters r and p. The event W, = k calls for 
(A) exactly r — 1 successes in the first k + r — 1 trials, followed by, (B) 
a success on trial k + r. The probability for (A) is obtained from a bino- 
mial distribution, and the probability for (B) is simply p, which leads to 
the following probability mass function for W,.: 


b= Priwa=ky =e py k= 0,1 3.6 


Another way of writing W, is as the sum W, = Z, + --- + Z,, where Z,, 
..., Z, are independent random variables each having the geometric dis- 
tribution of (3.4). This formulation readily yields the moments 

r1 — 1 - 
E[W)] = um. Var[W,] = rd = p) (3.7) 


2 


3.4. The Poisson Distribution 


If distributions were graded on a scale of one to ten, the Poisson clearly 
merits a ten. It plays a role in the class of discrete distributions that paral- 
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lels in some sense that of the normal distribution in the continuous class. 
The Poisson distribution occurs often in natural phenomena, for powerful 
and convincing reasons (the law of rare events, see later in this section). 
At the same time, the Poisson distribution has many elegant and surpris- 
ing mathematical properties that make analysis a pleasure. 

The Poisson distribution with parameter A > 0 has the probability mass 
function 


I's —A 
p(k) = 7 fork =0,1,.... (3.8) 
Using this series expansion 
Xd? 3 
ealtatytayte (3.9) 


we see that 2,2. p(k) = 1. The same series helps calculate the mean via 


de-* 7 o AK 
7 = Xe ‘> —— = j. 


(k — 1)! 


>, klk) => k 
k=0 k=1 


The same trick works on the variance, beginning with 


oe x a's eA 
> kk - Ip) = > kk - 12 a 
k=0 k=2 . 
Written in terms of a random variable X having the Poisson distribution 
with parameter A, we have just calculated E[X] = A and E[X(X — 1)] = 
d’, whence E[X?] = E[X(X — 1)] + E[X] = A? + A and Var[X] = E[X?] 
— {E[X]}? = A. That is, the mean and variance are both the same and 
equal to the parameter A of the Poisson distribution. 

The simplest form of the law of rare events asserts that the binomial 
distribution with parameters n and p converges to the Poisson with para- 
meter A if n > © and p — 0 in such a way that A = np remains constant. 
In words, given an indefinitely large number of independent trials, where 
success on each trial occurs with the same arbitrarily small probability, 
then the total number of successes will follow, approximately, a Poisson 
distribution. 

The proof is a relatively simple manipulation of limits. We begin by 
writing the binomial distribution in the form 
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| 
Pr(X =k} => 


Hin — iP ~ Py 


=n(n-1)-:-(n-k+D 


and then substitute p = A/n to get 


and 


to obtain the Poisson distribution 
MNe~> 
k! 


in the limit. Extended forms of the law of rare events are presented in V. 


Pr{X = k} = 


fork =0,1,.:. 


Example You Be the Judge In a purse-snatching incident, a woman 
described her assailant as being seven feet tall and wearing an orange hat, 
red shirt, green trousers, and yellow shoes. A short while later and a few 
blocks away a person fitting that description was seen and charged with 
the crime. 
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In court, the prosecution argued that the characteristics of the assailant 
were so rare as to make the evidence overwhelming that the defendant 
was the criminal. 

The defense argued that the description of the assailant was rare, and 
that therefore the number of people fitting the description should follow a 
Poisson distribution. Since one person fitting the description was found, 
the best estimate for the parameter is A = 1. Finally, they argued that the 
relevant computation is the conditional probability that there is at least 
one other person at large fitting the description given that one was ob- 
served. The defense calculated 


1 — Pr{X = 0} — Pr{X = 1 
Pr{X = 2\X=1} = 1 PrxSO} 
_ -!l w— ,-!l 
=—_*__*_ = 9.4180, 
l-e 


and since this figure is rather large, they argued that the circumstantial ev- 
idence arising out of the unusual description was too weak to satisfy the 
“beyond a reasonable doubt” criterion for guilt in criminal cases. 


3.5. The Multinomial Distribution 


This is a joint distribution of r variables in which only nonnegative inte- 
ger values 0, ..., m are possible. The joint probability mass function 1s 


Pr{X, =k,,...,X, =k,} 


tk, k, fk, +... +k =n, ( 
alle RPP | 
0 otherwise, 
where p; > Ofori=1,...,randp, +-:-+p,=1. 
Some moments are E[X;] = np,, Var[X;] = np{1 — p,), and Cov[X,X,] = 
AP; P;- 


The multinomial distribution generalizes the binomial. Consider an ex- 
periment having a total of r possible outcomes, and let the corresponding 
probabilities be p,, ..., p,, respectively. Now perform n independent 
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replications of the experiment and let X, record the total number of times 
that the ith type outcome is observed in the n trials. Then X,, ..., X, has 
the multinomial distribution given in (3.10). 


Exercises 


3.1. Consider tossing a fair coin five times and counting the total 
number of heads that appear. What is the probability that this total is 
three? 


3.2. A fraction p = 0.05 of the items coming off a production process 
are defective. If a random sample of 10 items is taken from the output of 
the process, what is the probability that the sample contains exactly one 
defective item? What is the probability that the sample contains one or 
fewer defective items? 


3.3. A fraction p = 0.05 of the items coming off of a production process 
are defective. The output of the process is sampled, one by one, in a ran- 
dom manner. What is the probability that the first defective item found is 
the tenth item sampled? 


3.4. A Poisson distributed random variable X has a mean of A = 2. 
What is the probability that X equals 2? What is the probability that X is 
less than or equal to 2? 


3.5. The number of bacteria in a prescribed area of a slide containing a 
sample of well water has a Poisson distribution with parameter 5. What is 
the probability that the slide shows 8 or more bacteria? | 


3.6. The discrete uniform distribution on {1,...,} corresponds to the 
probability mass function 


1 

— fork =1,...,N, 
p(k) = fF 

0 elsewhere. 


(a) Determine the mean and variance. 
(b) Suppose X and Y are independent random variables, each having 
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the discrete uniform distribution on {0, ..., mn}. Determine the 
probability mass function for the sum Z = X + Y. 

(c) Under the assumptions of (b), determine the probability mass func- 
tion for the minimum U = min{X, Y}. 


Problems 


3.1. Suppose that X has a discrete uniform distribution on the integers 
O, 1, ..., 9, and Y is independent and has the probability distribution 
Pr{Y = k} =a, fork =0, 1,.... What is the distribution of Z = X + Y 
(mod 10), their sum modulo 10? 


3.2. The mode of a probability mass function p(k) is any value k* for 
which p(k*) = p(k) for all k. Determine the mode(s) for 


(a) The Poisson distribution with parameter A > 0. 
(b) The binomial distribution with parameters n and p. 


3.3. Let X be a Poisson random variable with parameter A. Determine 
the probability that X 1s odd. 


3.4. Let U be a Poisson random variable with mean p. Determine the 
expected value of the random variable V = 1/(1 + U). 


3.5. Let Y = N — X where X has a binomial distribution with parame- 
ters N and p. Evaluate the product moment E[XY] and the covariance 
Cov[X, Y]. 


3.6. Suppose (X,, X,, X;) has a multinomial distribution with parameters 
M and 7, > 0 fori = 1, 2, 3, with a7, + 7, + 7m, = 1. 


(a) Determine the marginal distribution for X,. 

(b) Find the distribution for N = X, + X,. 

(c) What is the conditional probability Pr{X, =kIN =n} for 
Osksn? 


3.7. Let X and Y be independent Poisson distributed random variables 
having means yw and », respectively. Evaluate the convolution of their 
mass functions to determine the probability distribution of their sum Z = 
X+Y. 
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3.8. Let X and Y be independent binomial random variables having pa- 
rameters (N, p) and (M, p), respectively. Let Z = X + Y. 


(a) Argue that Z has a binomial distribution with parameters (NV + M, p) 
by writing X and Y as appropriate sums of Bernoulli random vari- 
ables. 

(b) Validate the result in (a) by evaluating the necessary convolution. 


3.9. Suppose that X and Y are independent random variables with the 
geometric distribution 


p(k) = (1 — m7 fork =0,1,.... 


Perform the appropriate convolution to identify the distribution of Z = X 
+ Y as a negative binomial. 


3.10. Determine numerical values to three decimal places for 
Pr{X = k}, k = 0, 1, 2, when 


(a) X has a binomial distribution with parameters n = 10 and p = 0.1. 

(b) X has a binomial distribution with parameters n = 100 and p = 
0.01. 

(c) X has a Poisson distribution with parameter A = 1. 


3.11. Let X and Y be independent random variables sharing the geo- 
metric distribution whose mass function is 


p(k) = (1 — a)r* fork =0,1,..., 


where 0 < 7 < 1. Let U = min{X, Y}, V = max{X, Y}, andW = V — U. 
. Determine the joint probability mass function for U and W and show that 
U and W are independent. 


3.12. Suppose that the telephone calls coming into a certain switch- 
board during a one-minute time interval follow a Poisson distribution with 
mean A = 4. If the switchboard can handle at most 6 calls per minute, 
what is the probability that the switchboard will receive more calls than it 
can handle during a specified one-minute interval? 


3.13. Suppose that a sample of 10 is taken from a day’s output of a ma- 
chine that produces parts of which 5 percent are normally defective. If 100 
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percent of a day’s production is inspected whenever the sample of 10 
gives 2 or more defective parts, then what is the probability that 100 per- 
cent of a day’s production will be inspected? What assumptions did you 
make? 


3.14. Suppose that a random variable Z has the geometric distribution 
pk) =pU-py fork =0,1,..., 
where p = 0.10. 


(a) Evaluate the mean and variance of Z. 
(b) What is the probability that Z strictly exceeds 10? 


3.15. Suppose that X is a Poisson distributed random variable with 
mean A = 2. Determine Pr{X = A}. 


3.16. Consider the generalized geometric distribution defined by 
Pp, = b(1 — py fork = 1,2,..., 


and 


where 0 < p< landp=bS pil — p). 


(a) Evaluate p, in terms of b and p. 

(b) What does the generalized geometric distribution reduce to when 
b = p? When b = p/(1 — p)? 

(c) Show that N = X + Z has the generalized geometric distribution 
when X is a Bernoulli random variable for which Pr{X = 1} = a, 
0 <a <1, and Z independently has the usual geometric distribu- 
tion given in (3.4). 


4. Important Continuous Distributions 


For future reference, this section catalogs several continuous distributions 
and some of their properties. 
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The normal distribution with parameters yz and a > 0 is given by the fa- 
miliar bell-shaped probability density function 


p(x; |, o*) = | eo HPae —-x<xy< om. (4.1) 
V210 

The density function is symmetric about the point 1, and the parameter o” 
is the variance of the distribution. The case uw = O and a’ = 1 is referred 
to as the standard normal distribution. \f X is normally distributed with 
mean yw and variance o”, then Z = (X — yw)/o has a standard normal dis- 
tribution. By this means, probability statements about arbitrary normal 
random variables can be reduced to equivalent statements about standard 
normal random variables. The standard normal density and distribution 
functions are given respectively by 


l 
GE) = Via e722 —e < E< 0, (4.2) 
and 
Ox) = | $Odé,  -w< x <u, (4.3) 


The central limit theorem explains in part the wide prevalence of the 
normal distribution in nature. A simple form of this aptly named result con- 
cerns the partial sums S, = €, + --- + &, of independent and identically 
distributed summands é,, é, ... having finite means yw = E[€,] and finite 
variances a” = Var[é,]. In this case, the central limit theorem asserts that 


i—% 


lim pr[- < x| = (x) forall x. (4.4) 


The precise statement of the theorem’s conclusion is given by equation 
(4.4). Intuition is sometimes enhanced by the looser statement that, for 
large n, the sum S,, is approximately normally distributed with mean nyu 
and variance no”. 

In practical terms we expect the normal distribution to arise whenever 
the numerical outcome of an experiment results from numerous small ad- 
ditive effects, all operating independently, and where no single or small 
group of effects is dominant. 
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The Lognormal Distribution 


If the natural logarithm of a nonnegative random variable V is normally 
distributed, then V is said to have a lognormal distribution. Conversely, if 
X is normally distributed with mean yw and variance o’, then V = e* de- 
fines a lognormally distributed random variable. The change-of-variable 
formula (2.15) applies to give the density function for V to be 


fv) = Fea — exp|- ~(=*—#)\ v=0. (45) 


The mean and variance are, respectively, 
E[V] = exp{u + 30°}, 
Var[V] = exp{2(u + 30°)}[exp{o-} — 1]. 


(4.6) 


4.2. The Exponential Distribution 


A nonnegative random variable T is said to have an exponential distribu- 
tion with parameter A > 0 if the probability density function is 


_ fAe™ for t = 0, Aq 
fet) = {o for t > 0. (4.7) 
The corresponding distribution function 1s 
_fl-e™ for t = 0, 
Felt) = |¢ for t > 0, 48) 


and the mean and variance are given, respectively, by 


1 
E(T] r and Var[T] ye 
Note that the parameter is the reciprocal of the mean and not the mean 
itself. 
The exponential distribution is fundamental in the theory of continu- 
ous-time Markov chains (see V), due in major part to its memoryless prop- 
erty, aS now explained. Think of T as a lifetime and, given that the unit 
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has survived up to time ¢, ask for the conditional distribution of the re- 
maining life 7 — t. Equivalently, for x > 0 determine the conditional 
probability Pr{T — t > x|T > t}. Directly applying the definition of con- 
ditional probability (see Section 2.7), we obtain 


>t+x,T> 
Pr{T —1>xT >t} = Pe tte 2 


Pr{7T > T} 
Pr{T>t+ x} 
= b > 4. 
PriT> ni (because x >0) (4.9) 
e AGtx) 
=— [from (4.8)] 
e 
= e* 


There is no memory in the sense that Pr(T —t >x|T >t} =e-* = 
Pr{Z7 > x}, and an item that has survived for ¢ units of time has a remain- 
ing lifetime that is statistically the same as that for a new item. 

To view the memoryless property somewhat differently, we introduce 
the hazard rate or failure rate r(s) associated with a nonnegative random 
variable S having continuous density g(s) and distribution function G(s) < 
1. The failure rate is defined by 


g(S) 


MS) = TG 


for s > 0. (4.10) 


We obtain the interpretation by calculating (see Section 2.2) 


Pr{s<S<s + As} 


Pris <S<s + Asls < S} 


Pr{s < S$} 
_ gs) As 
=F an Gn + o(As) [from (2.5)] 


= r(s) As + o(As). 


An item that has survived to time s will then fail in the interval (s, s + As] 
with conditional probability r(s) As + o(As), thus motivating the name 
“failure rate.” 
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We can invert (4.10) by integrating 
—g(s) — d[{1—G(s)\/ds _ d{ln{1 — G6s)]} 


“"9) = TI Gg 1 Gs) ds 


to obtain 
- | rs) ds = In{l — G(t)], 
0 


or 


t 


G(t) = 1 —exp|— | r(s) ds}, t= 0, 
0 


which gives the distribution function explicitly in terms of the hazard rate. 

The exponential distribution is uniquely the continuous distribution 
with the constant failure rate r(t) = A. (See Exercise 4.8 for the discrete 
analog.) The failure rate does not vary in time, another reflection of the 
memoryless property. 

Section 5 contains several exercises concerning the exponential distri- 
bution. In addition to providing practice in relevant algebraic and calcu- 
lus manipulations, these exercises are designed to enhance the reader’s in- 
tuition concerning the exponential law. 


4.3. The Uniform Distribution 


A random variable U is uniformly distributed over the interval [a, b], 
where a < b, if it has the probability density function 


] 
fiw Loma fora Sub, (4.11) 
0 elsewhere. 


The uniform distribution extends the notion of “equally likely” to the con- 
tinuous case. The distribution function is 


0 foru =a, 


a4 fora<x<b, (4.12) 


Fu(x) = 
—a 
l for x > b, 
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and the mean and variance are, respectively, 
] b —ay 
E[U] = 5 (a+b) and Var[U] = cn 


The uniform distribution on the unit interval [0, 1], for which a = 0 and 
b = 1, is most prevalent. 


4.4. The Gamma Distribution 


The gamma distribution with parameters a > 0 and A > 0 has probability 
density function 


A 
f@) == (Ax) 'e™* for x > 0. (4.13) 
l(a) 
Given an integer number a of independent exponentially distributed ran- 
dom variables Y,,..., Y, having common parameter A, then their sum 


X, = Y, +-:-:- + Y, has the gamma density of (4.13), from which we ob- 
tain the moments 


ELX,] => and Var[X,] = 7 


these moment formulas holding for noninteger a as well. 


4.5. The Beta Distribution 


The beta density with parameters a > 0 and B > 0 is given by 


C(a@)P(B) (4.14) 


D(a + B) x*-\(1 — x)! for0<x< l, 
f(x) = 
0 elsewhere. 


The mean and variance are, respectively, 
E[X] = — d Var[x] = — 28 ——_ 
eat p me NAAT Cat pa + B+ 1) 


(The gamma and beta functions are defined and briefly discussed in Sec- 
tion 6.) 
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4.6. The Joint Normal Distribution 


Let oy, Oy, My, My, and p be real constants subject to 0, > 0, a, > O, and 
—1< p< 1. For real variables x, y define 


Ox, y) = = = {(—* Hs) 2H - Hs) 2— He He) 
a} 


The joint normal (or bivariate normal) distribution for random variables 
X, Y is defined by the density function 


(4.15) 


by (x, y) = Ino.0.VTo oe 
(4.16) 


Xx exp[-5 O(X, y)}, —x<xXymn, 
The moments are 
E[X] = by, — ELY] = by, 
Var[X] = o%, Var[Y] = o;. 
and 
Cov[X, Y] = E[(X — px)(Y — py)] = poxoy. 


The dimensionless parameter p is called the correlation coefficient. When 
p is positive, then positive values of X are (stochastically) associated with 
positive values of Y. When p is negative, then positive values of X are as- 
sociated with negative values of Y. If p = 0, then X and Y are independent 
random variables. 


Linear Combinations of Normally Distributed Random Variables 


Suppose X and Y have the bivariate normal density (4.16), and let Z = 
aX + bY for arbitrary constants a, b. Then Z is normally distributed with 
mean 


E(Z] = apy + bpy 
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and variance 
Var[X] = a’o?, + 2abpo,oy + b’a%. 


A random vector X,, ..., X,, 1S Said to have a multivariate normal dis- 
tribution, or a joint normal distribution, if every linear combination 
a,X, +---+a,X,, a; real, has a univariate normal distribution. Obvi- 


ously, if X,, ..., X,, has a joint normal distribution, then so does the ran- 
dom vector Y,,..., Y,,, defined by the linear transformation in which 
Y,= aX, tess + AyXn, forj=1,...,m, 


for arbitrary constants @;,. 


Exercises 


4.1. The lifetime, in years, of a certain class of light bulbs has an expo- 
nential distribution with parameter A = 2. What is the probability that a 
bulb selected at random from this class will last more than 1.5 years? 
What is the probability that a bulb selected at random will last exactly 1.5 
years? 


4.2. The median of a random variable X is any value a for which 
Pr{X = a} = 35 and Pr{X = a} =}. Determine the median of an exponen- 
- tially distributed random variable with parameter A. Compare the median 
to the mean. 


4.3. The lengths, in inches, of cotton fibers used in a certain mill are ex- 
ponentially distributed random variables with parameter A. It is decided to 
convert all measurements in this mill to the metric system. Describe the 
probability distribution of the length, in centimeters, of cotton fibers in 
this mill. 


4.4. Twelve independent random variables, each uniformly distributed 
over the interval (0, 1], are added, and 6 is subtracted from the total. De- 
termine the mean and variance of the resulting random variable. 


4.5. Let X and Y have the joint normal distribution described in equa- 
tion (4.16). What value of aminimizes the variance of Z = aX + 
(1 — a@)Y? Simplify your result when X and Y are independent. 
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4.6. Suppose that U has a uniform distribution on the interval [0, 1]. De- 
rive the density function for the random variables 


(a) Y= —In(1 — UV). 
(b) W, = U" forn= 1. 


Hint: Refer to Section 2.6. 


4.7. Given independent exponentially distributed random variables S$ 
and T with common parameter A, determine the probability density func- 
tion of the sum R = S + T and identify its type by name. 


4.8. Let Z be a random variable with the geometric probability mass 
function 


pk=1—-mr, k=0,1..., 
where 0 < 7 < 1. 


(a) Show that Z has a constant failure rate in the sense that 
Pr(Z = kZ=k} =1— mfork=0,1,.... 

(b) Suppose Z’ is a discrete random variable whose possible values 
are 0, 1, .... and for which Pr{Z’ = kIZ' =k} =1- 7 for 
k = 0, 1,.... Show that the probability mass function for Z’ is p(k). 


Problems 


4.1. Evaluate the moment Efe], where A is an arbitrary real number 
and Z is a random variable following a standard normal distribution, by 
integrating 

+06 1 


Ele} J ek 


Hint: Complete the square —}z* + Az = —}[(z — A) — A’] and use 
the fact that 


e727/2 dz, 


| = e GARI? dz = 1, 


4.2. Let W be an exponentially distributed random variable with para- 
meter 0 and mean pw = 1/8. 


(a) Determine Pr{W> p}. 
(b) What is the mode of the distribution? 
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4.3. Let X and Y be independent random variables uniformly distributed 
over the interval [6 —}, 6 + 4] for some fixed 6. Show that W = X — Yhas 
a distribution that is independent of 0 with density function 


l+w for -l=w<0O, 
fw) = 41 -—w for0O=w<=l, 


0 for |lw| > 1. 


4.4. Suppose that the diameters of bearings are independent normally 
distributed random variables with mean pz = 1.005 inch and variance a; 
= (0.003)? inch’. The diameters of shafts are independent normally dis- 
tributed random variables having mean pt; = 0.995 inch and variance o; 


= (0.004) inch’. 


S B 
+ 


| 


Shaft Bearing 


Let S be the diameter of a shaft taken at random and let B be the diameter 
of a bearing. 


(a) What is the probability Pr{S > B} of interference? 


(b) What is the probability of one or fewer interferences in 20 random 
shaft-bearing pairs? 


Hint: The clearance, defined by C = B — S, is normally distributed 
(why?), and interference occurs only if C < 0. 


4.5. If X follows an exponential distribution with parameter a = 2, and 
independently, Y follows an exponential distribution with parameter B = 
3, what is the probability that X < Y? 
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5. Some Elementary Exercises 


We have collected in this section a number of exercises that go beyond 
what is usually covered in a first course in probability. 


5.1. Tail Probabilities 


In mathematics, what is a “trick” upon first encounter becomes a basic 
tool when familiarity through use is established. In dealing with nonneg- 
ative random variables, we can often simplify the analysis by the trick of 
approaching the problem through the upper tail probabilities of the form 
Pr{X > x}. Consider the following example. 

A jar has n chips numbered 1, 2,..., . A person draws a chip, returns 
it, draws another, returns it, and so on, until a chip is drawn that has been 
drawn before. Let X be the number of drawings. Find the probability dis- 
tribution for X. 

It is easier to compute Pr{X > k} first. Then, Pr{X > 1} = 1, since at 
least two draws are always required. The event {X > 2} occurs when dis- 
tinct numbers appear on the first two draws, whence Pr{X > 2} = 
(n/n)[(n — 1)/n]. Continuing in this manner, we obtain 


maton =ift 20-3). 


fork=1,...,n—1. (5.1) 
Finally, 


Pr{X =k} = Pr{X >k — 1} — Pr{X > k} 


“[t-B0S 
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Now try deriving Pr{X = k} directly, for comparison with the “trick” 
approach. 
The usefulness of the upper tail probabilities is enhanced by the formula 


E[X] = S Pr{X >k} = S Pr{X = k}, (5.2) 
k=0 k= 


valid for nonnegative integer-valued random variables X. To establish 
(5.2), abbreviate the notation by using p(k) = Pr{X = k}, and rearrange 
the terms in E[X] = 2,2, kp(k) as follows: 


E[X] = Op(0) + 1p(1) + 2p(2) + 3p(3) +: - 
= p(1) + p(2) + p(3) + p(4) +--: 
+ p(2) + p(3) + p(4) + -:- 
+ p(3) + p4@) +--- 
+ p(4) +--: 


Pr{X = 1} + Pr{X=2} + Pr(X=3}+--- 


thus establishing (5.2). 
For the chip drawing problem, the mean number of draws required, 
then, is 


E[X] = Pr{X > 0} + Pr{X > 1} +--- + Pr{X > n}, 


since Pr{X > k} = 0 for k > n. Then substituting (5.1) into (5.2) leads di- 
rectly to 


Bix) =2+(1-—)+(1-=\(1-=)+--. 


n 


(tftp 


Now let X be a nonnegative continuous random variable with density f(x) 
and distribution function F(x). The analog to (5.2) is 


E[X] = | [1 - F@) az, (5.3) 
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obtained by interchanging an order of integration as follows: 


x 


E[X] = { sft) dx I | dz) fcc) dx 


x 


= {[f roo ax]ae = f 1 - Fla 
0 


z 0 


Interchanging the order of integration where the limits are variables 
often proves difficult for many students. The trick of using indicator func- 
tions to make the limits of integration constant may simplify matters. In 
the preceding interchange, let 


wson Bait 
and then 
I aelf () dx = i [| U(z < x)f() de] dx 
0 “0 0 +6 


= [LJ 1 < xf(x) de dz = |[Jre de dz. 


As an application of (5.3), let X. = min{c, X} for some positive con- 
stant c. For example, suppose X is the failure time of a certain piece of 
equipment. A planned replacement policy is put in use that calls for re- 
placement of the equipment upon its failure or upon its reaching age c, 
whichever occurs first. Then 


xX if X =c, 


X, = min{c, X} = , ifX>c 


is the time for replacement. 
Now 


1 — F(z) ifO=z<c, 


Pr(X.> 2} = |, ife<z 


whence we obtain 


EIX)] = | [1 - F@l az, 
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which is decidedly shorter than 


Cc 


E[X.] = | xf(x) dx + c[1 — F()}. 
0 
Observe that X, is a random variable whose distribution is partly con- 
tinuous and partly discrete, thus establishing by example that such distri- 
butions do occur in practical applications. 


5.2. The Exponential Distribution 


This exercise is designed to foster intuition about the exponential distrib- 
ution, as well as to provide practice in algebraic and calculus manipula- 
tions relevant to stochastic modeling. 

Let X, and X, be independent exponentially distributed random vari- 
ables with respective parameters A, and A,, so that 


Pr{X,> t} = e*! fort =0,i = 0, 1. 


Let 
(; if X, =X,, 
N= . 
1 if X, = Xp, 
U = min{X, X,} = Xy; 
M=1-N; 
V = max{Xo, X,} = Xu 
and 


W=V-U=\x, — X\l. 


In this context, we derive the following: 


(a) Pr{N = 0 and U>1t} = etn Ao } 


The event {N = 0 and U > ¢} is exactly the event {t < X, = X,}, whence 


Pr{N = 0,U>t} =Pr{t<X)<X} 


= | Age “oA e7**1 Ax, Axo 


! <xo< xy 
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2) 


~ IJ Ae" dx, yew" ax, 
i ‘xo 


= [err A,e~***o dX, 
t 


= cox) (Ap + AyjenMo*A*o dy, 


Ng A, 
_ Ao eg FADE, 
Ay + A, 
No A, 
b Pr{N =O} = d Pr{N=1}= . 
b) — Pr(N=0}=>—P— and Pr = 1} = 7 
We use the result in (a) as follows: 
Pr{N = 0} = Pr{N=0, U>0} = Ao from (a) 
Ay + A 


Obviously, Pr{N = 1} = 1 — Pr{N = 0} = A,KA, + A)). 
(c) Pr{U > t} = en Morr, t= 0. 


Upon adding the result in (a), 


Pr{N =O and U>t} = e**" 
r{ an ‘=e L+A, 


to the corresponding quantity associated with N = 1, 


Pr{N = land U>t} = eo" 


9 


Ay + A, 
we obtain the desired result via 
Pr{U>t} = Pr{N=0, U>t} + Pr{(N= 1, U>12} 


o_ A _) 
Ao +taA, ATA, 


= e Ay tAyy 


= @ Ay tape 
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At this point observe that U and N are independent random variables. 
This follows because (a), (b), and (c) together give 


Pr{N = 0 and U>t} = Pr{N = 0} X Pr{U> ¢}. 


Think about this remarkable result for a moment. Suppose X, and X, 
represent lifetimes, and A, = 0.001, while A, = 1. The mean lifetimes are 
E[X,] = 1000 and E[X,] = 1. Suppose we observe that the time of the first 
death is rather small, say, U = min{X, X,} = 3. In spite of vast disparity 
. between the mean lifetimes, the observation that U = ; provides no infor- 
mation about which of the two units, 0 or 1, was first to die! This appar- 
ent paradox is yet another, more subtle, manifestation of the memoryless 
property unique to the exponential density. 

We continue with the exercise. 


(d) Pr(W>tN=O0} =e", 120. 


The event {W>t and N= 0} for t=O corresponds exactly to the 
event {t< X, — X,}. Thus 


Pr{W > t and N = 0} = Pr{X, — X, > t} 


{| Ape **V eo"! dx, dx, 


xy —Xo>t 


j (J ayer dry) dee" dr, 


xot t 


[ermiror A,e~ vo dx, 
0 

do 
Ay + A, 


Ay + A, 


eM (Ag + Aje"ot**0 dy, 
0 


~Ayt 


= Pr{N = 0}e"*" [from (b)]. 
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Then, using the basic definition of conditional probability (Section 2.7), 


we obtain 
Pr{(W>1t,N=0} 
> IN = 0} = ———— =e, > 0, 
Pr{W > iN = 0} PrN = 0} e t=0 


as desired. 
Of course a parallel formula holds conditional on N = 1: 
Pr{W>(N=1}=e, 120, 


and using the law of total probability we obtain the distribution of W in 
the form 


Pr(W >t} =Pr(W>14,N=0} + Pr(W>4,N=1} 


~ eg Mig 20 
Ay +A, Ay + A, 
(e) U and W = V — U are independent random variables. 


To establish this final consequence of the memoryless property, it suf- 
fices to show that 


Pr{U>uandW>w}=Pr{U>u} Pr{W>w} forallu=0,w=0. 
Determining first 


Pr{N = 0, U>u, W> w} = Pr{u< X, < X, — w} 


Ae **9V "dy, dx, 


u<xgax,—w 


(nan ena 


xotw 


| 
ro) 


ora) 
_— [ermeorm Ape” *o*o dX, 
u“ 


Jem JO, + Ae“ Coro dX, 


Jemma totam, 
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and then, by symmetry, 


A 
Pr{N=1,U>u,W> =( | vee 
{ u w} +A, ee 


and finally adding the two expressions, we obtain 


A 
Pr{U>u,W>w} = ( Xo Jen 4 ( Jens fervarane 
Ay t+ A, A +A, 


=Pr{W>w} Pr{U> u}, u,w= 0. 


The calculation is complete. 


Exercises 


5.1. Let X have a binomial distribution with parameters n = 4 and p = j. 
Compute the probabilities Pr{X = k} for k = 1, 2, 3, 4, and sum these to 
verify that the mean of the distribution is 1. 


5.2. A jar has four chips colored red, green, blue, and yellow. A person 
draws a chip, observes its color, and returns it. Chips are now drawn re- 
peatedly, without replacement, until the first chip drawn is selected again. 
What is the mean number of draws required? 


9.3. Let X be an exponentially distributed random variable with para- 
meter A. Determine the mean of X 


(a) by integrating by parts in the definition in equation (2.7) with 
m= 1; 

(b) by integrating the upper tail probabilities in accordance with equa- 
tion (5.3). : 


Which method do you find easier? 


9.4. Asystem has two components: A and B. The operating times until 
failure of the two components are independent and exponentially distrib- 
uted random variables with parameters 2 for component A, and 3 for B. 
The system fails at the first component failure. 


(a) What is the mean time to failure for component A? For component 
B? 
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(b) What is the mean time to system failure? 

(c) What is the probability that it is component A that causes system 
failure? 

(d) Suppose that it is component A that fails first. What is the mean re- 
maining operating life of component B? 


5.5. Consider a post office with two clerks. John, Paul, and Naomi enter 
simultaneously. John and Paul go directly to the clerks, while Naomi must 
wait until either John or Paul is finished before she begins service. 


(a) If all of the service times are independent exponentially distributed 
random variables with the same mean 1/A, what is the probability 
that Naomi is still in the post office after the other two have left? 

(b) How does your answer change if the two clerks have different ser- 
vice rates, say A, = 3 and A, = 47? 

(c) The mean time that Naomi spends in the post office is less than that 
for John or Paul provided that max{A,, A,} > c min{A,, A,} for a 
certain constant c. What is the value of this constant? 


Problems 


5.1. Let X,, X,, ... be independent and identically distributed random 
variables having the cumulative distribution function F(x) = Pr{X = x}. 
For a fixed number €, let N be the first index k for which X, > &. That is, 
N=1if X,>& N=2 if X, S € and X, > €; etc. Determine the proba- 
bility mass function for N. 


5.2. Let X,, X,,..., X, be independent random variables, all exponen- 
tially distributed with the same parameter A. Determine the distribution 
function for the minimum Z = min{X,, ... , X,}. 


5.3. Suppose that X is a discrete random variable having the geometric 
distribution whose probability mass function is 
p(k) = p(t — p)* fork =0,1,.... 


(a) Determine the upper tail probabilities Pr{X > k} fork =0,1,.... 
(b) Evaluate the mean via E[X] = 2,2) Pr{X > k}. 
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5.4. Let V be a continuous random variable taking both positive and 
negative values and whose mean exists. Derive the formula 


mn O 
E[V] = | [1 — Fv] dv — | F,(v) dv. 
O —x 


5.5. Show that 


E(W?] = | 2y[1 — Fy] dy 
0 
for a nonnegative random variable W. 


5.6. Determine the upper tail probabilities Pr{V >t} and mean E[V] 
for a random variable V having the exponential density 


0 forv <0, 
fulv) = Ine“ for v = 0, 


where A is a fixed positive parameter. 


5.7. Let X,, X,,..., X,, be independent random variables that are expo- 
nentially distributed with respective parameters A,, A,, ...,A,. Identify 
the distribution of the minimum V = min{X,, X,,...,X,,}. 


Hint: For any real number v, the event {V>v} is equivalent to 
{X, >v,X,>v,...,X, >Vv}. 


5.8. Let U,, U,,..., U, be independent uniformly distributed random 
variables on the unit interval [0, 1]. Define the minimum V, = 
min{U,, U,,..., U,}. , 


(a) Show that Pr{V, > v} = (1 — v)"forO Svs l. 
(b) Let W, = nV,. Show that Pr{W, > w} = [1 — (w/n)]’ forO Swen, 
and thus 
lim Pr{W, > w} =e™ for w = 0. 


Nase 


5.9. A flashlight requires two good batteries in order to shine. Suppose, 
for the sake of this academic exercise, that the lifetimes of batteries in use 
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are independent random variables that are exponentially distributed with 
parameter A = 1. Reserve batteries do not deteriorate. You begin with five 
fresh batteries. On average, how long can you shine your light? 


6. Useful Functions, Integrals, and Sums 
Collected here for later reference are some calculations and formulas that 


are especially pertinent in probability modeling. 
We begin with several exponential integrals, the first and simplest being 


| e* dx = -e™. (6.1) 


When we use integration by parts, the second integral that we introduce 
reduces to the first in the manner 


| xe~ dx = —xe* + | e*dx = —-e“*(1 + x). (6.2) 


Then (6.1) and (6.2) are the special cases of a = 1 and a = 2, respec- 
tively, in the general formula, valid for any real number a@ for which the 
integrals are defined, given by 


[ xe dx = —x* 'e* + (a — 1) [ xe dx. (6.3) 
Fixing the limits of integration leads to the gamma function, defined by 
[(a)=[x'e*dx,  fora>0. (6.4) 

0) 


From (6.3) it follows that 
I'(a) = (a2 — 1)P(a— 1), (6.5) 
and therefore, for any integers k, 


T'(k) = (Kk — 1)(K — 2)---2-T (1). (6.6) 
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An easy consequence of (6.1) is the evaluation (1) = 1, which with (6.5) 
shows that the gamma function at integral arguments is a generalization 
of the factorial function, and 


ck) = (k - 1)! fork = 1,2,.... (6.7) 
A more difficult integration shows that 
TG) = Vz, (6.8) 
which with (6.6) provides 


1xX3X5X:::X (2n- 1) 


on Va, forn=0,1,.... (6.9) 


Tnt+i)= 


Stirling’s formula is the following important asymptotic evaluation of 
the factorial function: 


n! — n"e~"(2amn)'ereni2n, (6. 10) 


in which 


1 
~ >  e H(n) < 1. 1 
lone (6.11) 


We sometimes write this in the looser form 
n! ~ n"e~"(27m)'” as n — ©, (6.12) 


the symbol “~” signifying that the ratio of the two sides in (6.12) ap- 
proaches 1 as n — ©. For the binomial coefficient ({) = n!/[k!(n — k)!] we 
then obtain 


") ~" as n — ©, | (6.13) 


as a consequence of (6.12) and the exponential limit 


k n 
e* = lim (1 _ ~). 


n% n 
The integral 
] 


B(m, n) = | x" — xy"! dx, (6.14) 


0 
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which converges when m and n are positive, defines the beta function, re- 
lated to the gamma function by 


_ Tan) 


B(m, n) = [(m + n) 


form>0,n> 0. (6.15) 


For nonnegative integral values m, n, 
l 


In! 
B(m + ln + 1) = [xc _ x)" dx = mon. 


é (m+n+ 1)! (0.16) 


For n = 1, 2,..., the binomial theorem provides the evaluation 
(l-x)'= > (—1y(7) x4 for -~“<x< &, (6.17) 
k=0 


The formula may be generalized for nonintegral n by appropriately gen- 
eralizing the binomial coefficient, defining for any real number a, 


al(a—1)-:-(a-—k-+1) 
a fork = 12,..., (6.18) 
| for k = 0. 


As a special case, for any positive integer n, 


—nA\ 4, ant ))::-atk— 1) 
Cy) I) k! 


hed (6.19) 
= 1" k } 
The general binomial theorem, valid for all real a, is 
(l-x)'= > (- (Fx for-Il<x< 1. (6.20) 
When @ = —n for a positive integer n, we obtain a group of formulas use- 


ful in dealing with geometric series. For a positive integer n, in view of 
(6.19) and (6.20), we have 


(1 -yr=F ("rT ‘x! for x} < l. (6.21) 
k=0 
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The familiar formula 
a | 
SY xk ltxtx?t+---=—— forlx<1 (6.22) 
—x 


for the sum of a geometric series results from (6.21) with n = 1. The cases 
n = 2 andn = 3 yield the formulas 


Sd (ke + Det = 14+ 2x4 3x? +... 
k=0 


] 


=“Gup Cf x] <1, (6.23) 


3 (kK + 2)(k + 1)x* = for |x| < 1. (6.24) 
k=0 


(1 — x)? 


Sums of Numbers 


The following sums of powers of integers have simple expressions: 


+ 
Lt2¢--¢ n= Mt 
2 
+ + 
be Meee pyre Mt VDEnt YD 
6 
n(n + 1y 


14+ B4---43= 
” 4 


Chapter II 


Conditional Probability and 
Conditional Expectation 


1. The Discrete Case 


The conditional probability Pr{A|B} of the event A given the event B is 
defined by 
| Pr{A and B} 
Pr{A|B} = ——— f Pr{B} > 0, 1.1 

HAIB} = Fo, if Pr(B) (1.1) 
and is not defined, or is assigned an arbitrary value, when Pr{B} = 0. Let 
X and Y be random variables that can attain only countably many differ- 
ent values, say 0, 1, 2, .... The conditional probability mass function 
Pxy(xly) of X given Y = y is defined by 


_ Pr{X =xand Y= y} fPlY = vi >0 
Pxy(aly) ~ Pr{Y = y} 1 r{ _ y} ’ 


and is not defined, or is assigned an arbitrary value, whenever 
Pr{Y = y} = 0. In terms of the joint and marginal probability mass func- 
tions pyy(x, y) and p,(y) = 2, Pxy(x, y), respectively, the definition is 


Px, y) 
Py) 


Observe that p, Ax y) is a probability mass function in x for each fixed 
y, Le., Pyy(Xly) = QO and 2, Pyy(ély) = 1, for all x, y. 


Pyy(aly) = if py(y)>0; x,y=0,1,.... (1.2) 
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The law of total probability takes the form 


mm 


= 2? Px (xly)py(y) (y). (1.3) 


Notice in (1.3) that the points y where p,,(x|y) is not defined are exactly 
those values for which p,(y) = 0, and hence, do not affect the computa- 
tion. The lack of a complete prescription for the conditional probability 
mass function, a nuisance in some instances, is always consistent with 
. subsequent calculations. 


Example Let X have a binomial distribution with parameters p and N, 
where WN has a binomial distribution with parameters g and M. What is the 
marginal distribution of X? 

We are given the conditional probability mass function 


Px x(k{n) = (era —py*, k=0,1,...,0, 
and the marginal distribution 


M 
pun) =("\g' ay, n= 0,122.5. 


We apply the law of total probability in the form of (1.3) to obtain 


Pr{X = k} -S Px v(kin)py(n) 


n=0 


S nt ‘J yk M! "J ! 
= —_— 2) — ny —_ i 
yay k(n — k)! I P ni(M — n)! 4 4 


Me gf INS i = pt 
~ PU (4) 2. abe mic?) 


(pqy(l — ol 4 ae 


M! 
kM — k)! l-g 
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k 
] —_ M- - k = 0, l, e ee .°¢ 9 M. 
= aM — b! (pqy(1 — pq) 


In words, X has a binomial distribution with parameters M and pq. 


Example Suppose X has a binomial distribution with parameters p and 
N where N has a Poisson distribution with mean A. What is the marginal 
distribution for X? 
Proceeding as in the previous example but now using 
n eA 


Py(n) = 9 n=0,1,..., 
n! 


we obtain 


Pr{X = k} = Rpts 


n=0 


° NN"e~> 
= ] — n—-k 

Pitto— =p? Pl = py 

- He pt & [AC = pyy# 

naz (n—k)! 
_ Opye™* jxi-p 
k! 

r k Ap 

oe fork =0,1,.... 


In words, X has a Poisson distribution with mean Ap. 


Example Suppose X has a negative binomial distribution with parame- 
ters p and N, where N has the geometric distribution 


Prin) = (1 — BB" forn = 1,2,.... 
What is the marginal distribution for X? 
We are given the conditional probability mass function 


n+k-1 
Pen (Kin) =( k |p — py,k=0,1,.... 
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Using the law of total probability, we obtain 


Pr{X = k} = > pyw(Kin)py(n) 
n=0 
= (n+k-—1)! 
= y ate p"(1 — p11 — B)B"" 


a kin — WD! 


in+k-1 
=(1- Bl - pip > (" 


2 ke \epy-' 


= (1 — B11 — p)'p( — Bpy 
— l — k 
= (PHEP\ I=? | fork=0,1,.... 
1 — Bp/\1— Bp 
We recognize the marginal distribution of X as being of geometric form. 


Let g be a function for which the expectation of g(X) is finite. We de- 
fine the conditional expected value of g(X) given Y = y by the formula 


E[e(X)¥ = yl = >. gx)pxvGly) if py(y) > 0, (1.4) 


and the conditional mean is not defined at values y for which p,(y) = 0. 
The law of total probability for conditional expectation reads 


E[g(X)] = >. Elg(X)¥ = ylpy(y). (1.5) 
y 


The conditional expected value E[g(X IY = y] is a function of the real 
variable y. If we evaluate this function at the random variable Y, we ob- 
tain a random variable that we denote by E[g(X )|Y]. The law of total prob- 
ability in (1.5) now may be written in the form 


E[g(X)] = E{E[g(X)|Y}. (1.6) 


Since the conditional expectation of g(X) given Y = y is the expectation 
with respect to the conditional probability mass function Py y(X| y), condi- 
tional expectations behave in many ways like ordinary expectations. The 
following list summarizes some properties of conditional expectations. In 
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this list, with or without affixes, X and Y are jointly distributed random 
variables; c is a real number; g is a function for which E [jg(X YI] <o;h 
is a bounded function; and v is a function of two variables for which 
E [|vx, Y)|] < o, The properties are 


(1) Eleg(X,) + c.8(X,)l¥ = yl 


= c Ele (XY = yl + coELe(X,)|¥ = yl. (1.7) 
(2) ifg=0, then E[g(X)|Y = y] = 0. (1.8) 
(3) E[v(X, Y)¥ = y] = Elv(X, yl¥ = yl. (1.9) 
(4) Elg(X)Y = y] = Eleg(X)] if X and Yare independent. (1.10) 
(5) Elg(X)A(Y)|Y = y] = hO)ELg(X)Y = yl. (1.11) 


(6) Elg(X)h(Y)] = > A(yELe(X)Y = ylpyy) 
¥ (1.12) 
= E{h(Y)E[g(X)|Y}}. 


As a consequence of (1.7), (1.11), and (1.12), with either g = 1 orh = 1, 
we obtain 


E[clY = y] =, (1.13) 
E[h(Y)|¥ = y] = h(y), (1.14) 


E[g(X)] = >. Elg(X)|¥ = ylp,y) = E{Elg(X)|Y]}. (1.15) 
y 


Exercises 


1.1. Troll a six-sided die and observe the number N on the uppermost 
face. I then toss a fair coin N times and observe X, the total number of 
heads to appear. What is the probability that VN = 3 and X = 2? What is 
the probability that X = 5? What is E[X], the expected number of heads 
to appear? 


1.2. Four nickels and six dimes are tossed, and the total number N of 
heads is observed. If N = 4, what is the conditional probability that ex- 
actly two of the nickels were heads? 


62 li Conditional Probability and Conditional Expectation 


1.3. A poker hand of five cards is dealt from a normal deck of 52 cards. 
Let X be the number of aces in the hand. Determine Pr{X > 1|x = 1}. 
This is the probability that the hand contains more than one ace, given that 
it has at least one ace. Compare this with the probability that the hand con- 
tains more than one ace, given that it contains the ace of spades. 


1.4. A six-sided die is rolled, and the number N on the uppermost face 
is recorded. From a jar containing 10 tags numbered 1, 2,..., 10 we then 
select N tags at random without replacement. Let X be the smallest num- 
ber on the drawn tags. Determine Pr{X = 2}. 


1.5. Let X be a Poisson random variable with parameter A. Find the con- 
ditional mean of X given that X is odd. 


1.6. Suppose U and V are independent and follow the geometric distri- 
bution 

p(k) = p( — p)* fork =0,1,.... 
Define the random variable Z = U + V. 


(a) Determine the joint probability mass function py 7(u, z) = 


Pr{U = u, Z = 2}. 
(b) Determine the conditional probability mass function for U given 
that Z = n. 
Problems 


1.1. Let M have a binomial distribution with parameters N and p. Con- 
ditioned on M, the random variable X has a binomial distribution with pa- 
rameters M and 7r. 


(a) Determine the marginal distribution for X. 
(b) Determine the covariance between X and Y = M — X. 


1.2. Acard is picked at random from N cards labeled 1, 2,..., N, and 
the number that appears is X. A second card is picked at random from 
cards numbered 1, 2, ..., X and its number is Y. Determine the condi- 
tional distribution of X given Y = y, fory = 1,2,.... 
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1.3. Let X and Y denote the respective outcomes when two fair dice are 
thrown. Let U = min{X, Y}, V = max{X, Y},andS = U+ V,T=V-—U. 


(a) Determine the conditional probability mass function for U given 
V=y. 
(b) Determine the joint mass function for S and T. 


1.4. Suppose that X has a binomial distribution with parameters p = 3 
and N, where N is also random and follows a binomial distribution with 
parameters g = ; and M = 20. What is the mean of X? 


1.5. Anickel is tossed 20 times in succession. Every time that the nickel 
comes up heads, a dime is tossed. Let X count the number of heads ap- 
pearing on tosses of the dime. Determine Pr{X = 0}. 


1.6. A dime is tossed repeatedly until a head appears. Let N be the trial 
number on which this first head occurs. Then a nickel is tossed N times. 
Let X count the number of times that the nickel comes up tails. Determine 
Pr{X = 0}, Pr{X = 1}, and E[X]. 


1.7. The probability that an airplane accident that is due to structural 
failure is correctly diagnosed is 0.85, and the probability that an airplane 
accident that is not due to structural failure is incorrectly diagnosed as 
being due to structural failure is 0.35. If 30 percent of all airplane acci- 
dents are due to structural failure, then find the probability that an airplane 
accident is due to structural failure given that it has been diagnosed as due 
to structural failure. 


1.8. Initially an urn contains one red and one green ball. A ball is drawn 
at random from the urn, observed, and then replaced. If this ball is red, 
then an additional red ball is placed in the urn. If the ball is green, then a 
green ball is added. A second ball is drawn. Find the conditional proba- 
bility that the first ball was red given that the second ball drawn was red. 


1.9. Let N have a Poisson distribution with parameter A = 1. Con- 
ditioned on N = n, let X have a uniform distribution over the integers 
0, 1,..., +1. What is the marginal distribution for X? 


64 ll Conditional Probability and Conditional Expectation 


1.10. Do men have more sisters than women have? Ina certain soci- 
ety, all married couples use the following strategy to determine the num- 
ber of children that they will have: If the first child is a girl, they have no 
more children. If the first child is a boy, they have a second child. If the 
second child is a girl, they have no more children. If the second child is a 
boy, they have exactly one additional child. (We ignore twins, assume 
sexes are equally likely, and the sex of distinct children are independent 
random variables, etc.) (a) What is the probability distribution for the 
number of children in a family? (b) What is the probability distribution for 
‘the number of girl children in a family? (c) A male child is chosen at ran- 
dom from all of the male children in the population. What is the proba- 
bility distribution for the number of sisters of this child? What is the prob- 
ability distribution for the number of his brothers? 


2. The Dice Game Craps 


An analysis of the dice game known as craps provides an educational ex- 
ample of the use of conditional probability in stochastic modeling. In 
craps, two dice are rolled and the sum of their uppermost faces is ob- 
served. If the sum has value 2, 3, or 12, the player loses immediately. If 
the sum is 7 or 11, the player wins. If the sum 1s 4, 5, 6, 8, 9, or 10, then 
further rolls are required to resolve the game. In the case where the sum 
is 4, for example, the dice are rolled repeatedly until either a sum of 4 
reappears or a sum of 7 is observed. If the 4 appears first, the roller wins; 
if the seven appears first, he or she loses. 

Consider repeated rolls of the pair of dice and let Z, forn = 0, 1,...be 
the sum observed on the nth roll. Then Z,, Z,, . . . are independent identi- 
cally distributed random variables. If the dice are fair, the probability mass 
function is . 


p(2) = p(8) = 
p,(3) = p2(9) = 
p24) = p10) = 


L 

36> 

2 

3 

36> 

; (2.1) 
pS) = ¥%, p11) = 

3 

6 

36> 


‘I sie si si si 


pz(6) = p12) = 
p27) = 
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Let A denote the event that the player wins the game. By the law of total 
probability, 


12 
Pr{A} = > Pr{A|Z, = k}p-(k). (2.2) 
k=2 


Because Z,=2, 3, or 12 calls for an immediate loss, then 
Pr{A|Z, = k} = 0 for k = 2, 3, or 12. Similarly, Z, = 7 or 11 results in an 
immediate win, and thus Pr{AlZ, =7}= Pr{AjZ, = 11} = 1. It remains 
to consider the values Z, = 4, 5, 6, 8, 9, and 10, which call for additional 
rolls. Since the logic remains the same in each of these cases, we will 
argue only the case in which Z, = 4. Abbreviate with a = Pr{AiZ, = 4}. 
Then a@ is the probability that in successive rolls Z,, Z,, ... of a pair of 
dice, a sum of 4 appears before a sum of 7. Denote this event by B, and 
again bring 1n the law of total probability. Then 


12 
a = Pr{B} = > Pr{BIZ, = k}p,(k). (2.3) 
k=2 


Now Pr{ BIZ, = 4} = 1, while Pr{ BIZ, = 7} = 0. If the first roll results in 
anything other than a 4 or a 7, the problem is repeated in a statistically 
identical setting. That is, Pr{ BIZ, = k}= a for k #4 or 7. Substitution 
into (2.3) results in 


a = p,(4)X 1+ p(7)X0+ > p(k) X a 
k#4,7 
= p2(4) + [1 — p2(4) — pa, 
or 


pz(4) 


= — Pe) _ 2.4 
p2(4) + p(T) eC) 


The same result may be secured by means of a longer, more computa- 
tional, method. One may partition the event B into disjoint elemental 
events by wniting 


B= {Z, = 4} U {Z, # 4 0r 7, Z, = 4} 
U {Z, #40r7,Z, #40r7,Z,=4} U--:-, 
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and then 
Pr{B} = Pr{Z, = 4} + Pr{Z, #4 or 7, Z, = 4} 
+ Pr{Z, #4o0r7,Z,#4or7,Z2,=4}) +---. 


Now use the independence of Z,, Z,, ... and sum a geometric series to se- 
cure 


Pr{B} = pz(4) + [1 — p24) — p27) p4) 
+ [1 — p2(4) — pe V'pz(4) + +: - 
____P2(4) 
P2(4) + p27) 
in agreement with (2.4). 


Extending the result just obtained to the other cases having more than 
one roll, we have 


k 
Pr{AlZ, = k} = oe for kk = 4,5, 6, 8,9, 10. 
p2k) + p27) 
Finally, substitution into (2.2) yields the total win probability 
pz ky’ 
x=4.5.68.9.10 Pz(k) + pz(7) 


‘The numerical values for p,(k) given in (2.1), together with (2.5), deter- 
mine the win probability 


Pr{A} = 0.49292929 ---. 


Pri A} = pz(7) + pl) + (2.5) 


Having explained the computations, let us go on to a more interesting 
question. Suppose that the dice are not perfect cubes but are shaved so as 
to be slightly thinner in one dimension than in the other two. The numbers 
that appear on opposite faces on a single die always sum to 7. That is, 1 is 
opposite 6, 2 is opposite 5, and 3 is opposite 4. Suppose it 1s the 3-4 di- 
mension that is smaller than the other two. See Figure 2.1. This will cause 
3 and 4 to appear more frequently than the other faces, 1, 2, 5, and 6. To 
see this, think of the extreme case in which the 3-4 dimension is very thin, 
leading to a 3 or 4 on almost all tosses. Letting Y denote the result of toss- 
ing a single shaved die, we postulate that the probability mass function is 
given by 
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py(3) = py(4) = 5 + 2e = p,, 
Py(1) = py(2) = py(5) = py(6) = 5 —-€=p., 


where e > 0 is a small quantity depending on the amount by which the die 
has been biased. 


A Cubic Die A Shaved Die 


Figure 2.1 A cubic die versus a die that has been shaved down in one dimension. 


If both dice are shaved in the same manner, the mass function for their 
sum can be determined in a straightforward manner from the following 
joint table: 
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It is easily seen that the probability mass function for the sum of the dice 
1S 


p(2) = p= = p(12), 

p(3) = 2p = p(11), 

P(4) = p-(p- + 2p.) = p10), 
P(S) = 4p. p_ = p(9), 

p(6) = p2 + (p, + p-) = p(s), 
p(7) = 4p2 + 2p%. 


To obtain a numerical value to compare to the win probability 
0.492929 - - - associated with fair dice, let us arbitrarily set e = 0.02, so 
that p_ = 0.146666 --- and p, = 0.206666---. Then routine substitu- 
tions according to the table lead to 


p(2) = p(12) = 0.02151111, p(5) = p(9) = 0.12124445, 
p(3) = p(11) = 0.04302222, p(6) = p(8) = 0.14635556, (2.6) 
p(4) = p(10) = 0.08213333, p(7) = 0.17146667, 


and the win probability becomes Pr{A} = 0.5029237. 

The win probability of 0.4929293 with fair dice is unfavorable, that is, 
is less than 3. With shaved dice, the win probability is favorable, now being 
0.5029237. What appears to be a slight change becomes, in fact, quite sig- 
nificant when a large number of games are played. See III, Section 5. 


Exercises 


2.1. A red die is rolled a single time. A green die is rolled repeatedly. 
The game stops the first time that the sum of the two dice is either 4 or 7. 
What is the probability that the game stops with a sum of 4? 


2.2. Verify the win probability of 0.5029237 by substituting from (2.6) 
into (2.5). 


2.3. Determine the win probability when the dice are shaved on the 1-6 
faces and p, = 0.206666 - - - and p_ = 0.146666---. 
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Problems 


2.1. Let X,, X,, ... be independent identically distributed positive ran- 
dom variables whose common distribution function 1s F’. We interpret X,, 
X,, ... aS successive bids on an asset offered for sale. Suppose that the 
policy is followed of accepting the first bid that exceeds some prescribed 
number A. Formally, the accepted bid is X,, where 


N= min{k= 1;X,> A}. 
Set a = Pr{X, > A} and M = E[X,]. 


(a) Argue the equation 
M=| xdF@) +(1-a)M 
A 


by considering the possibilities, either the first bid is accepted, or it 
is not. 
(b) Solve for M, thereby obtaining 
M=a''! | x dF(x). 
A 
(c) When X, has an exponential distribution with parameter A, use the 
memoryless property to deduce M =A + A™'. 
(d) Verify this result by calculation in (b). 


2.2. Consider a pair of dice that are unbalanced by the addition of 
weights in the following manner: Die #1 has a small piece of lead placed 
near the four side, causing the appearance of the outcome 3 more often 
than usual, while die #2 is weighted near the three side, causing the out- 
come 4 to appear more often than usual. We assign the probabilities 


Die #1 
P(1) = p(2) = p(5) = p(6) = 0.166667, 
p(3) = 0.186666, 
p(4) = 0.146666; 
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Die #2 

P(1) = p(2) = p(S) = p(6) = 0.166667, 
p(3) = 0.146666, 

p(4) = 0.186666. 


Determine the win probability if the game of craps is played with these 
loaded dice. 


3. Random Sums 


Sums of the form X = €, + --- + &, where N is random, arise frequently 
and in varied contexts. Our study of random sums begins with a crisp 
definition and a precise statement of the assumptions effective in this sec- 
tion, followed by some quick examples. 

We postulate a sequence é,, &,, ... of independent and identically dis- 
tributed random variables. Let N be a discrete random variable, inde- 
pendent of é,, €,, ... and having the probability mass function py(n) = 
Pr{N = n} forn = 0, 1,.... Define the random sum X by 

0 if N = Q, 
x=|, +--+. + &, if N > 0. 


We save space by abbreviating (3.1) to simply X = €, +---+ &, 
understanding that X = 0 whenever N = 0. 


(3.1) 


Examples 


(a) Queueing Let N be the number of customers arriving at a service 
facility in a specified period of time, and let & be the service time 
required by the ith customer. Then X = €, + --- + &, is the total 
demand for service time. 

(b) Risk Theory Suppose that a total of N claims arrives at an in- 
surance company in a given week. Let € be the amount of the 
ith claim. Then the total liability of the insurance company is X = 
Eot-+- + &y. 

(c) Population Models Let N be the number of plants of a given 
species in a specified area, and let €, be the number of seeds pro- 
duced by the ith plant. Then X = € +--- + &, gives the total 

_ number of seeds produced in the area. 

(d) Biometrics A wildlife sampling scheme traps a random number N 
of a given species. Let €; be the weight of the ith specimen. Then 
X=& +--: + &% 1s the total weight captured. 
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When é,, &, . . . are discrete random variables, the necessary background 
in conditional probability was covered in Section 1. In order to study the 
random sum X = é, + --- + &, when &, &, ... are continuous random 
variables, we need to extend our knowledge of conditional distributions. 


3.1. Conditional Distributions: The Mixed Case 


Let X and N be jointly distributed random variables and suppose that the 
possible values for N are the discrete set n = 0, 1, 2, .... Then the ele- 
mentary definition of conditional probability (1.1) applies to define the 
conditional distribution function F, y(xin) of the random variable X given 
that V = nto be 


Pr{X = x and N = n} 


F, y(x\n) = Pr{N = n} 


if Pr{N=n}>0, (3.2) 


and the conditional distribution function is not defined at values of n for 
which Pr{N = n} = 0. It is elementary to verify that F, v(x\n) is a proba- 
bility distribution function in x at each value of n for which it is defined. 

The case in which X is a discrete random variable was covered in Sec- 
tion |. Now let us suppose that X is continuous and that F, y(n) is differ- 
entiable in x at each value of n for which Pr{N = n} > 0. We define the 
conditional probability density function f, y(x|n) for the random variable X 
given that NV = n by setting 


fr v(ain) = < F, (xin) if Pr{[N =n} > 0. (3.3) 


Again, fe vn) is a probability density function in x at each value of n for 
which it is defined. Moreover, the conditional density as defined in (3.3) 
has the appropriate properties, for example, 


b 
Prias=X<b,N=nj}= | Fa w(xln)py(n) dx (3.4) 


for a<b and where p,(n) = Pr{N =n}. The law of total probability 
leads to the marginal probability density function for X via 


x 


FOO = > fevOln)py(n). (3.5) 
0 


n= 
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Suppose that g is a function for which E[|g(X)|] < ©. The conditional 
expectation of g(X) given that N = n is defined by 


Ele(X)IN =n] = | (a)fe v(x) de. 3.6) 


Stipulated thus, E[g(X \IN =n] satisfies the properties listed in (1.7) to 
(1.15) for the joint discrete case. For example, the law of total probability is 


Ele(X)) = ¥ Ele(X)IN =n] py(n) = E{E[g(X)|N]}. (3.7) 
n=0 


3.2. The Moments of a Random Sum 


Let us assume that €, and N have the finite moments 
Elé,] = mo, Var[é,] = o° 
E[N] = v, Var[N] = 7°, 


and determine the mean and variance for X = €, + --- + &, as defined in 
(3.1). The derivation provides practice in manipulating conditional expec- 
tations, and the results, 


E[X] = py, Var[X] = vo? + w’?’, (3.9) 


are useful and important. The properties of conditional expectation listed 
in (1.7) to (1.15) justify the steps in the determination. 
If we begin with the mean E[X], then 


(3.8) 


E[X] = > E[X|\N = n]py(n) [(by (1.15)] 
=), E[é +--- + E|N = n] p,(n) (definition of xX) 
= 2, Bl + + GIN= npn) [by (1.9)] 


=), E[é, + -+- + &)py(n) [by (1.10)] 


= >, nmpy(n) = 
n=] 


3. Random Sums 73 


To determine the variance, we begin with the elementary step 
Var[X] = E[(X — wv)’] = El(X — Nw + Nw — vpy’] 
= E[(X — Np)’] + Elw(N — vy] (3.10) 
+ 2E[ W(X — Np)(N — v)). 


Then 
E((X — Np)’] = >, E((X — Np)’|\N = n]py(n) 
=> El(é +--+ + & — ny |N = n)py(n) 
n=] 
= 0° = >) nps(n) = vo”, 
n=] 
and 
E[w(N — v)] = wWEl(N — v)’] = p72’, 

while 


E{u(X — Nu)(N — v)) = p 2, E[(X — nu)(n — v)|N = n]py(n) 


= 2, (n — v)E[(X — nw)|N = n)py(n) 


=0 


(because E[(X — nw)|N = n] = Elé, + --- + &, — np] = 0). Then (3.10) 
with the subsequent three calculations validates the variance of X as stated 
in (3.9). 


Example The number of offspring of a given species is a random vari- 
able having probability mass function p(k) fork = 0, 1,....A population 
begins with a single parent who produces a random number N of progeny, 
each of which independently produces offspring according to p(k) to form 
a second generation. Then the total number of descendants in the second 
generation may be written X = €, + --- + &,, where &, is the number of 
progeny of the kth offspring of the original parent. Let E[N] = E[é,] = wu 
and Var[N] = Var[é,] = 0’. Then 


E[X] = wand Var[X] = por*(1 + p). 
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3.3. The Distribution of a Random Sum 


Suppose that the summands 6,, €,, ... are continuous random variables 
having a probability density function f(z). For n = 1, the probability den- 
sity function for the fixed sum €, + --- + &, is the n-fold convolution of 
the density f(z), denoted by f(z) and recursively defined by 


fC?) = f(@ 


and 
f(2) = Fag — u)f(u) du forn> 1. (3.11) 


(See I, Section 2.5 for a discussion of convolutions.) Because N and 
é,, &,... are independent, then f(z) is also the conditional density func- 
tion for X = €, + --- + & given that N= n= 1. Let us suppose that 
Pr{N = 0} = O. Then, by the law of total probability as expressed in (3.5), 
X is continuous and has the marginal density function 


x 


fx) = >. fCdpp(n). (3.12) 


n=! 


Remark When N = 0can occur with positive probability, then X = é, 
+--+ + &, 1s a random variable having both continuous and discrete 
components to its distribution. Assuming that &,, €, ... are continuous 
with probability density function f(z), then 


Pr{X = 0} = Pr{N = 0} = p,(0), 
while for0 <a<bora<b< 0, then 


bi x 


Pria<x<b} =| IS: F(2)pu(n)} dz - (3.13) 


a ‘n=]1 


Example A Geometric Sum of Exponential Random Variables In the 
following computational example, suppose 


Ae” for z = 0, 
Me) = lo for z <0, 
and 

p(n) = BU — By"! forn=1,2,.... 


3. Random Sums 75 


For n = 1, the n-fold convolution of f(z) is the gamma density 


A" 
f(2) (n — 1)! 
0 forz <0. 


> __ gle Xz for z = 0, 


(See I, Section 4.4 for discussion.) 
The density for X = €, + --- + &, is given, according to (3.5), by 


fl) = >, f(Zpy(n) 


nt 


= iran 1 — n-1 
aa-p © A-B 


— yaya LAG = B)zy"" 
\Pe 2 Gab (n — 1)! 


= ABe~*er'- Be 


= \Be*, z=0. 
Surprise! X has an exponential distribution with parameter AB. 


Example Stock Price Changes Stochastic models for price fluctua- 
tions of publicly traded assets were developed as early as 1900. 

Let Z denote the difference in price of a single share of a certain stock 
between the close of one trading day and the close of the next. For an 
actively traded stock, a large number of transactions take place in a single 
day, and the total daily price change is the sum of the changes over these 
individual transactions. If we assume that price changes over successive 
transactions are independent random variables having a common finite 
variance,’ then the central limit theorem applies. The price change over 
a large number of transactions should follow a normal, or Gaussian, 
distribution. 


' Rather strong economic arguments in support of these assumptions can be given. The 
independence follows from concepts of a “perfect market,” and the common variance from 
notions of time stationarity. 
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A variety of empirical studies have supported this conclusion. For the 
most part, these studies involved price changes over a fixed number of 
transactions. Other studies found discrepancies in that both very small and 
very large price changes occurred more frequently in the data than sug- 
gested by normal theory. At the same time, intermediate-size price 
changes were underrepresented in the data. For the most part these stud- 
ies examined price changes over fixed durations containing a random 
number of transactions. 

A natural question arises: Does the random number of transactions in a 
given day provide a possible explanation for the departures from normal- 
ity that are observed in data of daily price changes? Let us model the daily 
price change in the form 


Z= E+E 4+---+&=&4X, (3.14) 


where &, €,,... are independent normally distributed random variables 
with common mean zero and variance a’, and N has a Poisson distribu- 
tion with mean v. 

We interpret N as the number of transactions during the day, &; for 
i = 1 as the price change during the ith transaction, and &, as an initial 
price change arising between the close of the market on one day and the 
opening of the market on the next. (An obvious generalization would 
allow the distribution of & to differ from that of €,, €,....) 

Conditioned on N = n, the random variable Z = & + €, +... + &, 1s 
normally distributed with mean zero and variance (n + 1)o”. The condi- 
tional density function is 


Mo dant bot 2¢n+ lo?!’ 


Since the probability mass function for N is 


a -A 


Px(n) = ; n=0,1,..., 
n! 
using (3.12) we determine the probability density function for the daily 
price change to be 


A"'e74 
. 


n 


LO => (2 
n=0 
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The formula for the density f(z) does not simplify. Nevertheless, numer- 
ical calculations are possible. When A = 1 and a’ = 3, then (3.9) shows 
that the variance of the daily price change Z in the model (3.14) is 
Var[Z] = (1 + A)o? = 1. Thus comparing the density f(z) when A = 1 
and o? = } to a normal density with mean zero and variance, one sheds 
some light on the question at hand. 

The calculations were carried out and are shown in Figure 3.1. 

The departure from normality that is exhibited by the random sum in 
Figure 3.1 is consistent with the departure from normality shown by stock 
price changes over fixed time intervals. Of course, our calculations do not 
prove that the observed departure from normality 1s caused by the random 
number of transactions in a fixed time interval. Rather, the calculations 
show only that such an explanation is consistent with the data and is, 
therefore, a possible cause. 


Figure 3.1 A standard normal density (solid line) as compared to a density for 
a random sum (dashed line). Both densities have zero mean and unit variance. 


Exercises 


3.1. Asix-sided die is rolled, and the number N on the uppermost face is 
recorded. Then a fair coin is tossed N times, and the total number Z of heads 
to appear is observed. Determine the mean and variance of Z by viewing Z 
as a random sum of N Bernoulli random variables. Determine the proba- 
bility mass function of Z, and use it to find the mean and variance of Z. 
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3.2. Six nickels are tossed, and the total number N of heads is observed. 
Then N dimes are tossed, and the total number Z of tails among the dimes 
is observed. Determine the mean and variance of Z. What is the probabil- 
ity that Z = 2? 


3.3. Suppose that upon striking a plate a single electron is transformed 
into a number N of electrons, where N is a random variable with mean yu 
and standard deviation o. Suppose that each of these electrons strikes a 
second plate and releases further electrons, independently of each other 
and each with the same probability distribution as N. Let Z be the total 
number of electrons emitted from the second plate. Determine the mean 
and variance of Z. 


3.4. A six-sided die is rolled, and the number N on the uppermost face 
is recorded. From a jar containing 10 tags numbered 1, 2,..., 10 we then 
select N tags at random without replacement. Let X be the smallest num- 
ber on the drawn tags. Determine Pr{X = 2} and E[X]. 


3.5. The number of accidents occurring in a factory in a week is a Pois- 
son random variable with mean 2. The number of individuals injured in 
different accidents is independently distributed, each with mean 3 and 
variance 4. Determine the mean and variance of the number of individu- 
als injured in a week. 


Problems 


3.1. The following experiment is performed: An observation is made of 
a Poisson random variable N with parameter A. Then N independent 
Bernoulli trials are performed, each with probability p of success. Let Z 
be the total number of successes observed in the N trials. 


(a) Formulate Z as a random sum and thereby determine its mean and 
variance. 
(b) What is the distribution of Z? 


3.2. For each given p, let Z have a binomial distribution with parame- 
ters p and N. Suppose that N is itself binomially distributed with parame- 
ters q and M. Formulate Z as a random sum and show that Z has a bino- 
mial distribution with parameters pg and M. 
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3.3. Suppose that é,, &,, . .. are independent and identically distributed 
with Pr{ €, = +1} = 4. Let N be independent of €,, &, .. . and follow the 
geometric probability mass function 


Py(k) = al -— a) fork =0,1,..., 
where 0 < a < 1. Form the random sum Z = €, + --: + &. 


(a) Determine the mean and variance of Z. 
(b) Evaluate the higher moments m, = E[Z*] and m, = E[Z’]. 


Hint: Express Z* in terms of the €?s where €& = 1 and E[&&] = 0. 


3.4. Suppose é,, €, ... are independent and identically distributed ran- 
dom variables having mean yw and variance o’*. Form the random sum 
Sy=E teres t+ &. 


(a) Derive the mean and variance of S, when N has a Poisson distrib- 
ution with parameter A. 

(b) Determine the mean and variance of S, when N has a geometric dis- 
tribution with mean A = (1 — p)/p. 

(c) Compare the behaviors in (a) and (b) as A > ~. 


3.5. To form a slightly different random sum, let &, &, ... be indepen- 
dent identically distributed random variables and let N be a nonnegative 
integer-valued random variable, independent of &, €, .... The first two 
moments are 


El€é&J= pu, Varlgé,] = 0°, 
E[N] = v, Var[N] = 7°, 


Determine the mean and variance of the random sum Z = & + ---> + &. 


4. Conditioning on a Continuous Random Variable? 


Let X and Y be jointly distributed continuous random variables with joint 
probability density function f, ,(x, y). We define the conditional probabil- 
ity density function f, (x y) for the random variable X given that Y = y by 
the formula 


> The reader may wish to defer reading this section until encountering VII, on renewal 
processes, where conditioning on a continuous random variable first appears. 
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I AX, y) . 
yay) = if f(y) > 0, 4.1 
iy ly f,) fry (4.1) 
and the conditional density is not defined at values y for which f,(y) = 0. 


The conditional distribution function for X given Y = y is defined by 


Fy (aly) = | hk (Ely) dé if f(y) > 0. (4.2) 


Finally, given a function g for which E[|g(X)|] < %, the conditional ex- 
pectation of g(X) given that Y = y is defined to be 


EleXY =y) = Jefe) dx iff) >0. 43) 


The definitions given in (4.1) to (4.3) are a significant extension of our 
elementary notions of conditional probability because they allow us to 
condition on certain events having zero probability. To understand the dis- 
tinction, try to apply the elementary formula 


Pr{A and B} 
Pr{A|B} = ——— f Pr{B} >0 4.4 
(A1B) = ay if Pr{B} (4.4) 
to evaluate the conditional probability Pr{a <X = bly = y}. We set 
A= {a<X =b} and B= {Y = y}. But Y is a continuous random vari- 
able, and thus Pr{B} = Pr{ Y = y} = 0, and (4.4) cannot be applied. Equa- 
tion (4.2) saves the day, yielding 


b 
Pr{a<Xs bly = y} = Fy (bly) — Fy (aly) = lf (Ely) dé, (4.5) 
provided only that the density f,(y) is strictly positive at the point y. 

To emphasize the important advance being made, we consider the fol- 
lowing simple problem. A woman arrives at a bus stop at a time Y that is 
uniformly distributed between 0 (noon) and 1. Independently, the bus ar- 
rives at a time Z that is also uniformly distributed between 0 and 1. Given 
that the woman arrives at time Y = 0.20, what is the probability that she 
misses the bus? 

On the one hand, the answer Pr{Z < YlY = 0.20} = 0.20 is obvious. 
On the other hand, this elementary question cannot be answered by the 
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elementary conditional probability formula (4.4) because the event 
{Y = 0.20} has zero probability. To apply (4.1), start with the joint den- 
sity function 

l forO0OSzy<=1, 


fry(Z%, y) = lo 


elsewhere, 
and change variables according to X = Y — Z. Then 

fyy(, y) = 1 forOSysSl,y-lsxsy, 
and, applying (4.1), we find that 


fray, 0.20) _ 


Fx y(x{0.20) = F,(0.20) 


for —0.80 = x = 0.20. 
Finally, 
Pr{Z < Y|Y = 0.20} = Pr{X > OlY = 0.20} = [fi -(x|0.20) dx = 0.20. 
0 
We see that the definition in (4.1) leads to the intuitively correct answer. 
The conditional density function that is prescribed by (4.1) possesses 
all of the properties that are called for by our intuition and the basic con- 


cept of conditional probability. In particular, one can calculate the proba- 
bility of joint events by the formula 


Prla<X<b,c<Y<d}= frac axtfi(y) dy, (4.6) 


which becomes the law of total probability by setting c = —© and 
d = +o; 
+x, b 
Pra <X<b} =| |] ferley) def fo ay. (4.7) 


For the same reasons, the conditional expectation as defined in (4.3) sat- 
isfies the requirements listed in (1.7) to (1.11). The property (1.12), 
adapted to a continuous random variable Y, is written 


Elg(X)A(Y)] = E{h(Y)E[g(X|¥]} 


= [A EleCOlY = ylf-0) ay, (4.8) 
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valid for any bounded function h, and assuming E[le(X YI] < oc, When 
h(y) = 1, we recover the law of total probability in the form 


Ele(X)] = E(ELe@II = [ELOY = yIhO) dy. 4.9) 


Both the discrete and continuous cases of (4.8) and (4.9) are contained 
in the expressions 


Ele(X)A(Y)] = EUMMELCOIP= [A ELOOIP= y] dF,0), (4.10) 
dnd 
Ele(X)] = E(ElgXON= [Ego = yl dF). 4.11) 


[See the discussion following (2.9) in I for an explanation of the symbol- 
ism in (4.10) and (4.11).] 

The following exercises provide practice in deriving conditional prob- 
ability density functions and in manipulating the law of total probability. 


Example Suppose X and Y are jointly distributed random variables 
having the density function 


1 
fiv@ y) =e for x, y > 0. 
y 


We first determine the marginal density for y, obtaining 


fr) = | fare y) ae 
0 


nw 


0 
Then 
te (xy) = a. = yle&™ for x,y > 0. 
Y 


That is, conditional on Y = y, the random variable X has an exponential 
distribution with parameter 1/y. It is easily seen that E [XY =y]=y. 


Example For each given p, let X have a binomial distribution with pa- 
rameters p and N. Suppose that p is uniformly distributed on the interval 
(0, 1]. What is the resulting distribution of X? 

We are given the marginal distribution for p and the conditional distri- 
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bution for X. Applying the law of total probability and the beta integral I, 
(6.16) we obtain 


] 
Pr(X = k} = [Pr(X = kp = &F,0 dé 
0 


apf - 8 dé 
— Nt RUN by! 

— kUN-—k! (N +1)! 

—_ fork = 0,1 N 
Wa] ork =Q,1,...,N. 


That is, X is uniformly distributed on the integers 0, 1,..., N. 
When p has the beta distribution with parameters r and s, then similar 
calculations give 


MM  Twrts) fe-a- 


PHA == vb! TOP) 2 


& le — &)"-* dé 
_(N rr+ sor + brs + N—- &k) 
7 4 PONTO)PW +r + s) 
fork =0,1,...,N. 
Example A random variable Y follows the exponential distribution 


with parameter 9. Given that Y = y, the random variable X has a Poisson 
distribution with mean y. Applying the law of total probability then yields 


maak “y 
pr(x =k} = | > 7 Ge-" dy 
ve 


Q fore) 
— a | yleroros dy 
"0 


x 


6 
= ————__| u’e“ du 
ki\1+ yt! J 
9 for k= 0, 1,.... 


+ oye! 
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Suppose that Y has the gamma density 
0 


fr) = T(a) 


(Oy) te, y20. 
Then similar calculations yield 
r yre* 6 
ki T(a@) 


0° r k ] 
_ cta-l nu d 
MT (ay(l + a J eee 


—ane@ (rea) (rea): EOD 


Pr{X =k} = (Oy)""'e~* dy 


0 


This is the negative binomial distribution. 


Exercises 


4.1. Suppose that three contestants on a quiz show are each given the 
Same question, and that each answers it correctly, independently of the 
others, with probability p. But the difficulty of the question is itself a ran- 
dom variable, so let us suppose, for the sake of illustration, that p is uni- 
formly distributed over the interval (O, 1]. What is the probability that ex- 
actly two of the contestants answer the question correctly? 


4.2. Suppose that three components in a certain system each function 
with probability p and fail with probability 1 — p, each component oper- 
ating or failing independently of the others. But the system is in a random 
environment, so that p is itself a random variable. Suppose that p is uni- 
formly distributed over the interval (0, 1]. The system operates if at least 
two of the components operate. What is the probability that the system 
operates? 


4.3. A random variable T is selected that is uniformly distributed over 
the interval (O, 1]. Then a second random variable U is chosen, uniformly 
distributed on the interval (0, 7]. What is the probability that U exceeds 5? 
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4.4. Suppose X and Y are independent random variables, each exponen- 
tially distributed with parameter A. Determine the probability density 
function for Z = X/Y. 


4.5. Let U be uniformly distributed over the interval [0, L] where L fol- 
lows the gamma density f,(x) = xe™* for x = 0. What is the joint density 
function of U and V = L — U? 


Problems 


4.1. Suppose that the outcome X of a certain chance mechanism de- 
pends on a parameter p according to Pr{X = 1} = p and Pr{X = 0} = 
1 — p, where 0 = p = 1. Suppose that p is chosen at random, uniformly 
distributed over the unit interval [0, 1], and then that two independent out- 
comes X, and X, are observed. What is the unconditional correlation co- 
efficient between X, and X,? 


Note: Conditionally independent random variables may become de- 
pendent if they share a common parameter. 


4.2. Let N have a Poisson distribution with parameter A > 0. Suppose 
that, conditioned on N = n, the random variable X is binomially distrib- 
uted with parameters N = n and p. Set Y = N — X. Show that X and Y 
have Poisson distributions with respective parameters Ap and A(1 — p) 
and that X and Y are independent. 


Note: Conditionally dependent random variables may become inde- 
pendent through randomization. 


4.3. Let X have a Poisson distribution with parameter A > 0. Suppose A 
itself is random, following an exponential density with parameter 6. 


(a) What is the marginal distribution of X? 
(b) Determine the conditional density for A given X = k. 


4.4. Suppose X and Y are independent random variables having the 
same Poisson distribution with parameter A, but where A is also random, 
being exponentially distributed with parameter 6. What is the conditional 
distribution for X given that X + Y = n? 
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4.5. Let X and Y be jointly distributed random variables whose joint 
probability mass function is given in the following table: 


wI— win 


7) 
9 
1 
9 
0 


p(x, y) = Pr{X = x, Y= y} 


Show that the covariance between X and Y is zero even though X and Y 
are not independent. 


4.6. Let X,, X,, X,, ... be independent identically distributed nonnega- 
tive random variables having a continuous distribution. Let N be the first 
index k for which X, > X,. That is, N = 1 if X, > X,, N = 2 if X, = X, and 
X, > Xy, etc. Determine the probability mass function for N and the mean 
E[N]. (interpretation: X,, X,, . . . are successive offers or bids on a car that 
you are trying to sell. Then N is the index of the first bid that is better than 
the initial bid.) 


4.7. Suppose that X and Y are independent random variables, each hav- 
ing the same exponential distribution with parameter a. What is the con- 
ditional probability density function for X, give that Z = X + Y.= z? 


4.8. Let X and Y have the joint normal density given in I, (4.16). Show 
that the conditional density function for X, given that Y = y, is normal 
with moments 


pa: 
My y = My + a (y — My) 


and 


Oyy =O,V1—- fp’. 
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5. Martingales* 


Stochastic processes are characterized by the dependence relationships 
that exist among their variables. The martingale property is one such re- 
lationship that captures a notion of a game being fair. The martingale 
property is a restriction solely on the conditional means of some of the 
variables, given values of others, and does not otherwise depend on the 
actual distribution of the random variables in the stochastic process. De- 
spite the apparent weakness of the martingale assumption, the conse- 
quences are striking, as we hope to suggest. 


5.1. The Definition 


We begin the presentation with the simplest definition. 


Definition A stochastic process {X,;n = 0, 1,...} is a martingale if for 
n=0,1,..., 


(a) E[|X, 


}<%, 
and 
(b) E[X,.|X,---, X,] = X,. 
Taking expectations on both sides of (b), 
E{E[X,.:|Xo,.---» XJ} = E{X,}, 
and using the law of total probability in the form 
E{E(X 4X1, ---> Xi} = EX] 
shows that 
E[X,+:] = ELX,I, 
and consequently, a martingale has constant mean: 


E[X,] = E[X,] = E[X,], Oskesn. (5.1) 


* Some problems scattered throughout the text call for the student to identify certain sto- 
chastic processes as martingales. Otherwise, the material of this section is not used in the 
sequel. 
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A similar conditioning (see Problem 5.1) verifies that the martingale 
equality (b) extends to future times in the form 


E[X,|Xo....,X%,] =X,  form=n. (5.2) 


To relate the martingale property to concepts of fairness in gambling, 
consider X,, to be a certain player’s fortune after the nth play of a game. 
The game is “fair” if on average, the player’s fortune neither increases nor 
decreases at each play. The martingale property (b) requires the player’s 
fortune after the next play to equal, on average, his current fortune and not 
be otherwise affected by previous history. Some early work in martingale 
theory was motivated in part by problems in gambling. For example, mar- 
tingale systems theorems consider whether an astute choice of betting 
strategy can turn a fair game into a favorable one, and the name “martin- 
gale” derives from a French term for the particular strategy of doubling 
one’s bets until a win is secured. While it remains popular to illustrate 
martingale concepts with gambling examples, today, martingale theory 
has such broad scope and diverse applications that to think of it purely in 
terms of gambling would be unduly restrictive and misleading. 


Example Stock Prices ina Perfect Market Let X,, be the closing price 
at the end of day n of a certain publicly traded security such as a share of 
stock. While daily prices may fluctuate, many scholars believe that, in a 
perfect market, these price sequences should be martingales. In a perfect 
market freely open to all, they argue, it should not be possible to predict 
with any degree of accuracy whether a future price X,,, will be higher or 
lower than the current price X,. For example, if a future price could be ex- 
pected to be higher, then a number of buyers would enter the market, and 
their demand would raise the current price X,,. Similarly, if a future price 
could be predicted as lower, a number of sellers would appear and tend to 
depress the current price. Equilibrium obtains where the future price can- 
not be predicted, on average, as higher or lower, that is, where price se- 
quences are martingales. 


5.2. The Markov Inequality 


What does the mean of a random variable tell us about its distribution? 
For a nonnegative random variable X, Markov’s inequality is APr{X = A} 
<= E[X], for any positive constant A. For example, if E[X] = 1, then 
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Pr{X = 4} =, no matter what the actual distribution of X is. The proof 
uses two properties: (i) X = 0 (X is a nonnegative random variable); and 
(ii) E[X1{X = A}] = APr{X = A}. (Recall that 1(A) is the indicator of an 
event A and is one if A occurs and zero otherwise. See I, Section 3.1.) 
Then by the law of total probability, 


E[X] = E[X1{X = A}] + E[XU{X < A}] 
ZEIXUX=A}] = (by G)) 
> APr{X = A}, (by (1i)) 


and Markov’s inequality results. 


5.3. The Maximal Inequality for Nonnegative Martingales 


Because a martingale has constant mean, Markov’s inequality applied to 
a nonnegative martingale immediately yields 


E[X 
Pr{X, =A} Ss ElXo) 


A> 0. 
We will extend the reasoning behind Markov’s inequality to achieve an in- 
equality of far greater power: 


_ EX 
=e 


Instead of limiting the probability of a large value for a single observation 
X,, the maximal inequality (5.3) limits the probability of observing a large 
value anywhere in the time interval 0, ..., m, and since the right side of 
(5.3) does not depend on the length of the interval, the maximal inequal- 
ity limits the probability of observing a large value at any time in the in- 
finite future of the martingale! 

In order to prove the maximal inequality for nonnegative martingales, 
we need but a single additional fact: If X and Y are jointly distributed ran- 
dom variables and B 1s an arbitrary set, then 


E[X1{Y in B}] = E[E{X|Y}A{Y in B}]. (5.4) 


Pr{ MAXp<,< X, = r} (5.3) 


But (5.4) follows from the conditional expectation property (1.12), 
Elg(X)h(Y)] = E{h(Y)E[e(X)|¥]}, with g(x) = x and h(y) = 1(y in B}. 
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We will have need of (5.4) with X = X,, and Y = (Xp, ..., X,,), whereupon 
(5.4) followed by (5.2) then justifies 
E[X,1{X, <A, ..., Xi, <A, X, ZS AQ] 

= E[E{X,|X » Xf} WX <SA,. ~My <A, X, 2 AP (5.5) 


n—-l 


Theorem 5.1 The maximal inequality for nonnegative martingales 


Let X,, X,,... be a martingale with nonnegative values; 1.e., Pr{X,, = 0} 
= 1 forn = 0,1,....Forany A > 0, 
E[X] 
Pr{maxy-,<, X, =A} S _— a forOSnsm (5.6) 
and 
E[X 
Pr{max,.) X, >A} Ss ee for all n. (5.7) 


Proof Inequality (5.7) follows from (5.6) because the right side of (5.6) 
does not depend on m. We begin with the law of total probability, as in I, 
Section 2.1. Either the {X,, ..., X,,} sequence rises above A for the first 
time at some index n, or else it remains always below 4. As these 
possibilities are mutually exclusive and exhaustive, we apply the law of 
total probability to obtain 


m 


= > EIX,A{X) <A... Xi < A,X, 2A} 


Xn] n-] 


n=] 


+ E[X,1{X) <A... 2, X,, < Ad] 


m 
nn 


= ELX,A{X) <A... Xi < A,X, ZAM (X, 20) 


n-) 
n=] 


m 


= > EIXA{X) <A,...,X,..< A,X, 2A}] (using 5.5) 


n=] 


SAY Pr{Xy <A... Xi SAX, SAY 


n-1 
n=Q 


= A Pr{maxpo-.,<,, X, = A}. 


n 


Example A gambler begins with a unit amount of money and faces a 
series of independent fair games. Beginning with X, = 1, the gambler bets 
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the amount p, 0 < p < 1. If the first game is a win, which occurs with 
probability 5, the gambler’s fortune is X, = 1 + pX) = 1 + p. If the first 
game is a loss, then X, = 1 — pX) = 1 — p. After the nth play and with a 
current fortune of X,, the gambler wagers pX,, and 


(" + p)X, with probability 3, 
Xn+i = (1 — D)X» with probability 7 


Then {X,} is a nonnegative martingale, and the maximal inequality (5.6) 
with A = 2, for example, asserts that the probability that the gambler ever 
doubles his money is less than or equal to 5, and this holds no matter what 
the game is, as long as it is fair, and no matter what fraction p of his for- 
tune is wagered at each play. Indeed, the fraction wagered may vary from 
play to play, as long as it is chosen without knowledge of the next 
outcome. 

As amply demonstrated by this example, the maximal inequality is a 
very strong statement. Indeed, more elaborate arguments based on the 
maximal and other related martingale inequalities are used to show that a 
nonnegative martingale converges: If {X,} is a nonnegative martingale, 
then there exists a random variable, let us call it X,,, for which lim,,_,.. X, = 
X,,. We cannot guarantee the equality of the expectations in the limit, but 
the inequality E[X,] = E[X..] 2 0 can be established. 


Example In III, Section 8, we will introduce the branching process 
model for population growth. In this model, X,, is the number of individu- 
als in the population in the nth generation, and yz > 0 is the mean family 
size, or expected number of offspring of any single individual. The mean 
population size in the nth generation is X,mu". In this branching process 
model, X,,/u” is a nonnegative martingale (see III, Problem 8.4), and the 
maximal inequality implies that the probability of the actual population 
ever exceeding ten times the mean size is less than or equal to 1/10. The 
nonnegative martingale convergence theorem asserts that the evolution of 
such a population after many generations may be described by a single 
random variable X,, in the form 


X, = Xf’, for large n. 
Example How NOT to generate a uniformly distributed random vari- 


able Anur initially contains one red and one green ball. A ball is drawn 
at random and it is returned to the urn, together with another ball of the 
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same color. This process is repeated indefinitely. After the nth play there 
will be a total of n + 2 balls in the urn. Let R, be the number of these balls 
that are red, and X, = R,/(n + 2) the fraction of red balls. We claim that 
{X,,} is a martingale. First, observe that 


. + 1. with probability X,, 
Riv = R, with probability 1 — X,, 


so that 
ELR, 1X] = R, + X,, = X,(2 + n + 1), 
and finally, 
2+n+] 
+ X,, = nt n =, n ne 
ElXyeIXi) = my ElRvlIXi] =§ TQ X, = X 


This verifies the martingale property, and because such a fraction is al- 
ways nonnegative, indeed, between 0 and 1, there must be a random vari- 
able X,, to which the martingale converges. We will derive the probability 
distribution of the random limit. It 1s immediate that R, is equally likely to 
be 1 or 2, since the first ball chosen is equally likely to be red or green. 
Continuing, 


Pr{R, = 3} = Pr{R, = 3/R, = 2}Pr{R, = 2} 
= 3)() = 33 
Pr{R, = 2} = Pr{R, = 21R, = 1}Pr{R, = 1} 
+ Pr{R, = 2|R, = 2}Pr{R, = 2} 
= ()G) + @Q) = 3 
and since the probabilities must sum to one, 
Pr{R, = 1} =. 
By repeating these simple calculations, it is easy to see that 


l 
Pr{R, = k} = fork =1,2,...,n4+ 1, 
n+ ] 


and that therefore X, is uniformly distributed over the values 1/(n + 2), 
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2/(n + 2),...,(n + 1)(n + 2). This uniform distribution must prevail in 
the limit, which leads to 


Pr{X, =x} =x for0<x< 1. 


Think about this remarkable result for a minute! If you sit down in front 
of such an urn and play this game, eventually the fraction of red balls in 
your urn will stabilize in the near vicinity of some value, call it U. If I play 
the game, the fraction of red balls in my urn will stabilize also, but at an- 
other value, U’. Anyone who plays the game will find the fraction of red 
balls in the urn tending towards some limit, but everyone will experience 
a different limit. In fact, each play of the game generates a fresh, uni- 
formly distributed random variable, in the limit. Of course, there may 
be faster and simpler ways to generate uniformly distributed random 
variables. 

Martingale implications include many more inequalities and conver- 
gence theorems. As briefly mentioned at the start, there are so-called sys- 
tems theorems that delimit the conditions under which a gambling system, 
such as doubling the bets until a win is secured, can turn a fair game into 
a winning game. A deeper discussion of martingale theory would take us 
well beyond the scope of this introductory text, and our aim must be lim- 
ited to building an enthusiasm for further study. Nevertheless, a large va- 
riety of important martingales will be introduced in the Problems at the 
end of each section in the remainder of the book. 


Exercises 


5.1. Let X be an exponentially distributed random variable with mean 
E[X] = 1. For x = 0.5, 1, and 2, compare Pr{X > x} with the Markov in- 
equality bound E[X]/x. 


5.2. Let X be a Bernoulli random variable with parameter p. Compare 
Pr{X = 1} with the Markov inequality bound. 


5.3. Let &be a random variable with mean p and standard deviation o. 
Let X = (€ — y)’. Apply Markov’s inequality to X to deduce Chebyshev’s 
inequality: 


N 


Pr{\é— pl = &} =— for any € > 0. 


Co 
E 
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Problems 


5.1. Use the law of total probability for conditional expectations 
E[E{X\Y, Z}|Z] = E[X|Z] to show 


E[X121Xon«« Xn) = ELE{Xpa2Xos «+ - Xavi} Xoo «+ Xa. 
Conclude that when X,, is a martingale. 


E[Xy421Xo. yey X,] = X, 


ne 


5.2. Let U,, U,, ...be independent random variables each uniformly 
distributed over the interval (0, 1]. Show that X, = 1 and X,, = 2’U,--- U, 
forn = 1, 2,... defines a martingale. 


5.3 Let S, = 0, and forn = 1, let S, = e, + --- + &, be the sum of n 
independent random variables, each exponentially distributed with mean 


E[e] = 1. Show that 
X, = 2" exp(—S,), n 2 0 


defines a martingale. 


5.4. Let &, &, ... be independent Bernoulli random variables with 
parameter p, 0 < p < 1. Show that X, = 1 and X, =p" 6 ---&,n=1, 
2,..., defines a nonnegative martingale. What is the limit of X, as n—0? 


5.5. Consider a stochastic process that evolves according to the follow- 
ing laws: If X, = 0, then X,,, = 0, whereas if X, > 0, then 


_ |X, + 1 with probability ; 
aan b. — 1 with probability 4 
(a) Show that X, is a nonnegative martingale. 
(b) Suppose that X, = i > 0. Use the maximal inequality to bound 
Pr{X,=N_ forsomen = OX, = I}. 


Note: X,, represents the fortune of a player of a fair game who wagers $1 
at each bet and who is forced to quit if all money is lost (X,, = 0). This 
gambler’s ruin problem is discussed fully in III, Section 5.3. 


Chapter Ill 
Markov Chains: Introduction 


1. Definitions 


A Markov process {X,} is a stochastic process with the property that, given 
the value of X,, the values of X, for s > t are not influenced by the values 
of X, for u < t. In words, the probability of any particular future behavior 
of the process, when its current state is known exactly, is not altered by 
additional knowledge concerning its past behavior. A discrete-time 
Markov chain is a Markov process whose state space is a finite or count- 
able set, and whose (time) index set is T = (0, 1, 2, .. .). In formal terms, 
the Markov property is that 


Pr{X,,., = j|X, — ly, oe 9 X,,-| — Lt X,, = i} 
= Pr{X,., = JX, = 3} (1.1) 


for all time points n and all states ip, ... , i,-), 0, J. 

It is frequently convenient to label the state space of the Markov chain 
by the nonnegative integers {0, 1, 2,...}, which we will do unless the 
contrary is explicitly stated, and it is customary to speak of X,, as being in 
state iif X, = 0. 

The probability of X,,,, being in state j given that X,, is in state i is called 
the one-step transition probability and is denoted by P?-"*'. That is, 


Pre! = Pr{X,., = f/X, = ih. (1.2) 
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The notation emphasizes that in general the transition probabilities are 
functions not only of the initial and final states, but also of the time of 
transition as well. When the one-step transition probabilities are indepen- 
dent of the time variable n, we say that the Markov chain has stationary 
transition probabilities. Since the vast majority of Markov chains that we 
shall encounter have stationary transition probabilities, we limit our dis- 
cussion to this case. Then P;;"*' = P, is independent of n, and P,, is the 
conditional probability that the state value undergoes a transition from 
i to j in one trial. It is customary to arrange these numbers P,, in a matrix, 
in the infinite square array 


Po Po, Po Pos 
Pro Py Pi Pi, 


and refer to P = IP, as the Markov matrix or transition probability ma- 
trix of the process. | 

The ith row of P, for i = 0, 1,..., 1s the probability distribution of the 
values of X,,,, under the condition that X, = i. If the number of states 1s 
finite, then P is a finite square matrix whose order (the number of rows) 
is equal to the number of states. Clearly, the quantities P, satisfy the 
conditions 


y 


P..2=0 for i,j =0,1,2,..., (1.3) 


> P;=1  fori=0,1,2,.... (1.4) 
j=0 


The condition (1.4) merely expresses the fact that some transition occurs 
at each trial. (For convenience, one says that a transition has occurred 
even if the state remains unchanged.) 

A Markov process is completely defined once its transition probability 
matrix and initial state X, (or, more generally, the probability distribution 
of X,) are specified. We shall now prove this fact. 
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Let Pr{X, = i} = p,. It is enough to show how to compute the quantities 
Pr{X, = ip, X; =U, X, = b,...,X, = ty}, (1.5) 


since any probability involving X,,...,X;,, forj, <<... <j,, can be ob- 
tained, according to the axiom of total probability, by summing terms of 
the form (1.5). 

By the definition of conditional probabilities we obtain 


Pr{X_ = ip, X; = ty, X, = h,...,X, = i,} 
_ Pr{Xy a lo, X, _ Ly, ss Xi - i,-1} (1.6) 
x Pr{X,, = ly Xo = lo, X, = ly, e898 Xp-| = i,-,}. 


Now, by the definition of a Markov process, 
Pr{X,, = i|Xo = lo X = ly, see 9 X,-| = i,-1} 


(1.7) 
—- Pr{X, = l, Xn-| — i,-\} = P 


Substituting (1.7) into (1.6) gives 
Pr{X, = ip, X, =i, ...,X, = i,) 
= Pr{X = ig, X, = 0, . 6 Xue = by }P i 
Then, upon repeating the argument n — 1 additional times, (1.5) becomes 
Pr{X) = io, X; = i,,...,X, = t,} 


a of P. 


Ina e Ine dela 


— DP. ; C8) 
This shows that all finite-dimensional probabilities are specified once the 
transition probabilities and initial distribution are given, and in this sense 
the process is defined by these quantities. 

Related computations show that (1.1) is equivalent to the Markov prop- 
erty in the form 


Pr{X,, 4; =) s 2 8 9 Xm = j,|Xo = los ne) X,, = i,} 


(1.9) 
= Pr{X,, 4 =) a) Xn+m = Im X, = i,} 


for all time points n, m and all states ip, ..., i, J)...» > J. In other words, 
once (1.9) is established for the value m = 1, it holds for all m = 1 as 
well. 
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Exercises 


1.1. A Markov chain X), X,, ... on states 0, 1, 2 has the transition prob- 
ability matrix 
0 l 2 
0O||0.1 0.2 0.7 
P=1)/09 0.1 0 
210.1 08 0.1 


and initial distribution p, = Pr{X, = 0} = 0.3, p, = Pr{X, = 1} = 0.4, and 
p, = Pr{X, = 2} = 0.3. Determine Pr{X, = 0, X, = 1, X, = 2}. 


1.2. A Markov chain X,, X,, X2,... has the transition probability matrix 


0 ] 2 

0110.7 0.2 0.1 
P=1 0 06 0.4}). 

2 110.5 0 0.5 


Determine the conditional probabilities 


Pr{X, = 1, X; = 1]X,=0} and Pr{X, = 1, X, = 1|X, = 0}. 


1.3. A Markov chain X), X,, X,,... has the transition probability matrix 


0 I 2 

0||06 03 0.1 
P=1,03 03 0.4]. 

2004 0.1 0.5 


If it is known that the process starts in state X, = 1, determine the proba- 
bility Pr{X, = 1, X, = 0, X, = 2}. 
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1.4. A Markov chain X), X,, X>,... has the transition probability matrix 


0 2 

0/01 O01 0.8 
P=1)0.2 02 0.6}). 

200.3 0.3 04 


Determine the conditional probabilities 


Pr{X, = 1, X, = 1|X,=0} and Pr{X, = 1, X, = 1|X, = 0}. 


1.5. A Markov chain X), X,, X;,... has the transition probability matrix 


0 2 

01;0.3 02 0.5 
P=1)05 0.1 04 
200.5 0.2 0.3 


and initial distribution p, = 0.5 and p, = 0.5. Determine the probabilities 
Pr{X, = 1, X, = 1,X,=0} and Pr{x, = 1, X, = 1, X; = O}. 


Problems 


1.1. A simplified model for the spread of a disease goes this way: The 
total population size is N = 5, of which some are diseased and the re- 
mainder are healthy. During any single period of time, two people are se- 
lected at random from the population and assumed to interact. The selec- 
tion is such that an encounter between any pair of individuals in the 
population is just as likely as between any other pair. If one of these per- 
sons 1s diseased and the other not, then with probability a = 0.1 the dis- 
ease iS transmitted to the healthy person. Otherwise, no disease transmis- 
sion takes place. Let X, denote the number of diseased persons in the 
population at the end of the nth period. Specify the transition probability 
matrix. 
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1.2. Consider the problem of sending a binary message, 0 or 1, through 
a signal channel consisting of several stages, where transmission through 
each stage is subject to a fixed probability of error a. Suppose that X, = 0 
is the signal that is sent and let X,, be the signal that is received at the nth 
stage. Assume that {X,} 1s a Markov chain with transition probabilities 
Py» = P,, = 1 —- wand Py, = Py = a, whereO<a< 1. 


(a) Determine Pr{X, = 0, X, = 0, X, = 0}, the probability that no error 
occurs up to stage n = 2. ° 


(b) Determine the probability that a correct signal is received at stage 
2. 


Hint: This is Pr{X, = 0, X, = 0, X, = O} + Pr{X, = 0, X, = 1, X, = O}. 


1.3. Consider a sequence of items from a production process, with each 
item being graded as good or defective. Suppose that a good item is fol- 
lowed by another good item with probability a@ and is followed by a de- 
fective item with probability 1 — a. Similarly, a defective item is followed 
by another defective item with probability 6 and is followed by a good 
item with probability 1 — £. If the first item 1s good, what is the proba- 
bility that the first defective item to appear is the fifth item? 


1.4. The random variables é,, €,, . .. are independent and with the com- 
mon probability mass function 


k= 0 I 2 3 
Pr{é = k} = 0.1 0.3 0.2 0.4 


Set X, = 0, and let X, = max{é,,..., &,} be the largest € observed to date. 
Determine the transition probability matrix for the Markov chain {X,}. 


2. Transition Probability Matrices of a Markov Chain 


A Markov chain is completely defined by its one-step transition probabil- 
ity matrix and the specification of a probability distribution on the state of 
the process at time 0. The analysis of a Markov chain concerns mainly the 
calculation of the probabilities of the possible realizations of the process. 
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Central in these calculations are the n-step transition probability matrices 
p = ||P ||. Here Pw” denotes the probability that the process goes from 
state i to state j in n transitions. Formally, 


Pi) a Pr{X,.+ =] 


X, = tf. (2.1) 


Observe that we are dealing only with temporally homogeneous processes 
having stationary transition probabilities, since otherwise the left side of 
(2.1) would also depend on m. 

The Markov property allows us to express (2.1) in terms of |/P,(| as 
stated in the following theorem. 


Theorem 2.1 The n-step transition probabilities of a Markov chain 


satisfy 
P= > P.PY, (2.2) 
k=0 
where we define 
po = ifi = j, 
0 0 if i # j. 


From the theory of matrices we recognize the relation (2.2) as the for- 
mula for matrix multiplication, so that P” = P x P“~". By iterating this 
formula, we obtain 


Pp’ =PxPXx:-- xX P= P’; (2.3) 
ee 


n factors 


in other words, the n-step transition probabilities P'? are the entries in the 
matrix P", the nth power of P. 


Proof The proof proceeds via a first step analysis, a breaking down, or 
analysis, of the possible transitions on the first step, followed by an appli- 
cation of the Markov property. The event of going from state i to state j in 
n transitions can be realized in the mutually exclusive ways of going to 
some intermediate state k (k = 0, 1, ...) in the first transition, and then 
going from state k to state j in the remaining (n — 1) transitions. Because 
of the Markov property, the probability of the second transition is Py;-” 
and that of the first is clearly P,,. If we use the law of total probability, then 
(2.2) follows. The steps are 
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P = Pr{X, = j|X) = i} = > Pr{X, = j, X, = MX, = 3} 


I 
XE 


Pr{X, = kX, = i} Pr{X, = j|X. = i, X, = k} 


Paved 
I 
oO 


l 
u[M_18 


ai Per”. 


Paved 
il 


If the probability of the process initially being in state 7 is p,, i.e., the 
distribution law of X, is Pr{X, = j} = p,, then the probability of the 
process being in state k at time n is 


p= dp PP) = Pr{X, = k}. (2.4) 


Exercises 


2.1. A Markov chain {X,} on the states 0, 1, 2 has the transition proba- 
bility matrix 
0 1 2 
0|/0.1 0.2 0.7 
P=1})02 02 0.6/]|. 
2110.6 0.1 0.3 
(a) Compute the two- ‘we transition matrix P’. 


(b) What is Pr{X, = = 0}? 
(c) What is Pr{X, = ity = 0}? 


2.2. A particle moves among the states 0, 1, 2 according to a Markov 
process whose transition probability matrix is 


0 1 2 
Ojo 4 3 
P=1|/} 0 3 
2|]4 4 0 


Let X, denote the position of the particle at the nth move. Calculate 
Pr{X,, = 0|X, = 0} for n = 0, 1, 2, 3, 4. 


Exercises 103 


2.3. A Markov chain Xp, X,, X,, ... has the transition probability matrix 


0 2 

0/|0.7 0.2 0.1 
P=1 0 06 0.4). 

2105 O 0.5 


Determine the conditional probabilities 


Pr{X, = 1|X,=0} and Pr{X, = 1/X, = 0}. 


2.4. A Markov chain Xp, X,, X,, ... has the transition probability matrix 


0 2 

0106 03 0.1 
P=1)/03 03 0.4). 

210.4 01 0.5 


If it is known that the process starts in state X, = 1, determine the proba- 
bility Pr{X, = 2}. 


2.5. A Markov chain Xp, X,, X,,... has the transition probability matrix 


0 ] 2 

O01 O01 O08 
P=1/)02 02 0.6}. 

2103 03 04 


Determine the conditional probabilities 


Pr{X, = 1|X, = 0} and Pr{X, = 1|X, = 0}. 


2.6. A Markov chain Xp, X,, X,, ... has the transition probability matrix 


0 ] 2 

01;0.3 02 O05 
P=1//05 O01 0.4 
210.5 0.2 0.3 


104 lll Markov Chains: Introduction 


and initial distribution p, = 0.5 and p, = 0.5. Determine the probabilities 
Pr{X, = 0} and Pr{X, = 0}. 


Problems 


2.1. Consider the Markov chain whose transition probability matrix is 
given by 
0 1 2 3 
0)/04 03 02 O.1 
11} 0.1 04 O03 0.2 
2},0.3 02 O01 04 
3H 02 01 04 03 


P = 


Suppose that the initial distribution is p, = ; for i = 0, 1, 2, 3. Show that 
Pr{X, = k} = j,k = 0, 1, 2, 3, for all n. Can you deduce a general result 
from this example? 


2.2. Consider the problem of sending a binary message, 0 or 1, through 
a signal channel consisting of several stages, where transmission through 
each stage is subject to a fixed probability of error a. Let X, be the signal 
that is sent and let X, be the signal that is received at the nth stage. Sup- 
pose X, is a Markov chain with transition probabilities Py) = P,, = 1 — @ 
and Py), = Pi) = a, (0 < a < 1). Determine Pr{X, = 0X, = 0}, the prob- 
ability of correct transmission through five stages. 


2.3. Let X, denote the quality of the nth item produced by a production 
system with X, = 0 meaning “good” and X, = 1 meaning “defective.” 
Suppose that X,, evolves as a Markov chain whose transition probability 
matrix 1S 


0099 00 
0.12 0.88 


What is the probability that the fourth item is defective given that the first 
item is defective? 
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2.4. Suppose X,, is a two-state Markov chain whose transition probabil- 
ity matrix 1s 
0 l 
_ | a 1 — | 
Prili- B pY¢I 


Then Z, = (X,-_,, X,) 1s a Markov chain having the four states (0, 0), 
(O, 1), (1, 0), and (1, 1). Determine the transition probability matrix. 


2.5. A Markov chain has the transition probability matrix 


0 ] 2 

01}0.7 02 0.1 
P=1/]/03 05 0.2]]. 

2 O QO ] 


The Markov chain starts at time zero in state X, = 0. Let 
T = min{n = 0; X, = 2} 


be the first time that the process reaches state 2. Eventually, the process 
will reach and be absorbed into state 2. If in some experiment we observed 
such a process and noted that absorption had not yet taken place, we 
might be interested in the conditional probability that the process is in 
state 0 (or 1), given that absorption had not yet taken place. Determine 
Pr{X; = 0|X,, T > 3}. 


Hint: The event {T > 3} is exactly the same as the event 
{X; # 2} = {X; = 0} U {X; = 1}. 


3. Some Markov Chain Models 


The importance of Markov chains lies in the large number of natural phys- 
ical, biological, and economic phenomena that can be described by them 
and is enhanced by the amenability of Markov chains to quantitative ma- 
nipulation. In this section we give several examples of Markov chain mod- 
els that arise in various parts of science. General methods for computing 
certain functionals on Markov chains are derived in the following section. 
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Consider a situation in which a commodity is stocked in order to satisfy a 
continuing demand. We assume that the replenishment of stock takes 
place at the end of periods labeled n = 0, 1, 2,..., and we assume that 
the total aggregate demand for the commodity during period n is a random 
variable €, whose distribution function is independent of the time period, 


Pr{é, =k} =a, fork = 0,1,2,..., (3.1) 


‘where a, = 0 and >, a, = 1. The stock level is examined at the end of 
each period. A replenishment policy is prescribed by specifying two non- 
negative critical numbers s and § > s whose interpretation is, If the end- 
of-period stock quantity is not greater than s, then an amount sufficient to 
increase the quantity of stock on hand up to the level S is immediately pro- 
cured. If, however, the available stock is in excess of s, then no replen- 
ishment of stock is undertaken. Let X,, denote the quantity on hand at the 
end of period n just prior to restocking. The states of the process {X,,} con- 
sist of the possible values of stock size 


§,S-—1,..., 41,0, -1, -2,..., 


where a negative value is interpreted as an unfilled demand that will be 
satisfied immediately upon restocking. 
The process {X,} is depicted in Figure 3.1. 


Period 


Figure 3.1 The inventory process. 
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According the the rules of the inventory policy, the stock levels at two 
consecutive periods are connected by the relation 


5 — €4, ifs< xX, SS, 39 
Xn+1 ~~ S- Ea, if X,, <5, ( , ) 
where &, is the quantity demanded in the nth period, stipulated to follow 
the probability law (3.1). If we assume that the successive demands 
é,, &, ... are independent random variables, then the stock values Xp, X,, 
X,,... constitute a Markov chain whose transition probability matrix can 
be calculated in accordance with relation (3.2). Explicitly, 


P, = Pr{X,,,, = |X, = i} 
- [Pts =i-j} ifs<i<S, 
(Pr. =S-—j}  ifiss. 


Consider a spare parts inventory model as a numerical example in 
which either 0, 1, or 2 repair parts are demanded in any period, with 


Pr{é, = 0} = 0.5, Pr{é, = 1} = 0.4, Pr{é, = 2} = 0.1, 


and suppose s = Q, while S = 2. The possible values for X, are S = 2, 1, 0, 
and —1. To illustrate the transition probability calculations, we will con- 
sider first the determination of P,, = Pr{X,,, = 0X, = 1}. When X, = 1, 
then no replenishment takes place and the next state X,,,, = O results when 
the demand €,,, = 1, and this occurs with probability P,, = 0.4. To illus- 
trate another case, if X, = 0, then instantaneous replenishment to $ = 2 
ensues, and a next period level of X,,, = 0 results from the demand quan- 
tity €,, = 2. The corresponding probability of this outcome yields 
Po = 0.1. Continuing in this manner, we obtain the transition probability 
matrix 


—1 O +1 +2 

—1 0 01 04 O.5 
0 0 01 O04 O.5 
+1]} 0.1 04 O.5 
+2 0 01 O04 O.5 


© 
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Important quantities of interest in inventory models of this type are the 
long-term fraction of periods in which demand is not met (X, < 0) and 
long-term average inventory level. Using the notation p“? = Pr{X, = j}, 
we give these quantities respectively as lim,_.. 2<o p“ and lim,.,. 
2;>0 Jp. This illustrates the importance of determining conditions under 
which the probabilities p‘? stabilize and approach limiting probabilities 77, 
as n — © and of determining methods for calculating the limiting proba- 


bilities 7, when they exist. These topics are the subject of IV. 


"3.2. The Ehrenfest Urn Model 


A classical mathematical description of diffusion through a membrane is 
the famous Ehrenfest urn model. Imagine two containers containing a 
total of 2a balls (molecules). Suppose the first container, labeled A, holds 
k balls and the second container, B, holds the remaining 2a — k balls. A 
ball is selected at random (all selections are equally likely) from the total- 
ity of the 2a balls and moved to the other container. (A molecule diffuses 
at random through the membrane.) Each selection generates a transition 
of the process. Clearly, the balls fluctuate between the two containers with 
an average drift from the urn with the excess numbers to the one with the 
smaller concentration. 

Let Y, be the number of balls in urn A at the nth stage, and define 


X, = Y, — a. Then {X,} is a Markov chain on the states i = —a, 
—a+t+1,...,—-1,0, +1,..., a with transition probabilities 
a 
2a an 
—~/arti 
MS) fps ind, 
0 otherwise. 


An important quantity in the Ehrenfest urn model is the long-term, or 
equilibrium, distribution of the number of balls in each urn. 


3.3. Markov Chains in Genetics 


The following idealized genetics model was introduced by S. Wright to 
investigate the fluctuation of gene frequency under the influence of muta- 
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tion and selection. We begin by describing the so-called simple haploid 
model of random reproduction, disregarding mutation pressures and se- 
lective forces. We assume that we are dealing with a fixed population size 
of 2N genes composed of type-a and type-A individuals. The makeup of 
the next generation is determined by 2N independent Bernoulli trials as 
follows: If the parent population consists of j a-genes and 2N — j A-genes, 
then each trial results in a or A with probabilities 
-i gay th 

2N’ ’ 2N’ 
respectively. Repeated selections are done with replacement. By this pro- 
cedure we generate a Markov chain {X,}, where X, 1s the number of 
a-genes in the nth generation among a constant population size of 2N in- 
dividuals. The state space contains the 2N + 1 values {0, 1, 2,..., 2N}. 
The transition probability matrix is computed according to the binomial 
distribution as 


P; 


; 2N\ , 
PriXns1 = KX, = J} _ Pi — ( k \piai* ‘ 


(j,k =0,1,...,2N). 


(3.3) 


For some discussion of the biological justification of these postulates 
we refer the reader to Fisher.’ 

Notice that states 0 and 2N are completely absorbing in the sense that 
once X,, = 0 (or 2N) then X,,,, = 0 (or 2N, respectively) for all k = 0. One 
of the questions of interest is to determine the probability, under the con- 
dition X, = 1, that the population will attain fixation, 1.e., that it will be- 
come a pure population composed only of a-genes or A-genes. It is also 
pertinent to determine the rate of approach to fixation. We will examine 
such questions in our general analysis of absorption probabilities. 

A more complete model takes account of mutation pressures. We as- 
sume that prior to the formation of the new generation each gene has the 
possibility to mutate, that is, to change into a gene of the other kind. 
Specifically, we assume that for each gene the mutation a > A occurs 
with probability a, and A — a occurs with probability B. Again we as- 
sume that the composition of the next generation is determined by 2N 


'R. A. Fisher, The Genetical Theory of Natural Selection, Oxford (Clarendon) Press, 
London and New York, 1962. 
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independent binomial trials. The relevant values of p, and q; when the par- 
ent population consists of j a-genes are now taken to be 


and (3.4) 
a-La+(t --\a — B). 


The rationale is as follows: We assume that the mutation pressures oper- 
ate first, after which a new gene is chosen by random selection from the 
population. Now, the probability of selecting an a-gene after the mutation 
forces have acted is just 1/(2N) times the number of a-genes present; 
hence the average probability (averaged with respect to the possible mu- 
tations) is simply 1/(2N) times the average number of a-genes after muta- 
tion. But this average number is clearly j(1 — a) + (2N — j)B, which leads 
at once to (3.4). 

The transition probabilities of the associated Markov chain are calcu- 
lated by (3.3) using the values of p, and q; given in (3.4). 

If aB > 0, then fixation will not occur in any state. Instead, as n > ™, 
the distribution function of X, will approach a steady-state distribution 
of a random variable €, where Pr{é = k} = 7, (k = 0, 1, 2,..., 2N) 
(dio ™ = 1, 7m > 0). The distribution function of £ is called the steady- 
State gene frequency distribution. 

We return to the simple random mating model and discuss the concept 
of a selection force operating in favor of, say, a-genes. Suppose we wish 
to impose a selective advantage for a-genes over A-genes so that the rel- 
ative number of offspring have expectations proportional to 1 + s and 1, 
respectively, where s is small and positive. We replace p; = j/(2N) and 
q; = 1 — jlQN) by 

(i + C+ sy 
Be 2N + sj’ 43 
and build the next generation by binomial sampling as before. If the par- 


ent population consisted of j a-genes, then in the next generation the ex- 
pected population sizes of a-genes and A-genes, respectively, are 


(1 + SM (2N — (ON =) 
2N + sj 2N + sj 


= 1— 7p, 


2N———— 
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The ratio of expected population size of a-genes to A-genes at the 
(n + 1)st generation 1s 


l+s ’ J ( 1+ = number of a-genes in the nth generation 
| 2N -] ] 


which explains the meaning of selection. 


number of A-genes in the nth generation 


3.4. A Discrete Queueing Markov Chain 


Customers arrive for service and take their place in a waiting line. During 
each period of time, a single customer is served, provided that at least one 
customer is present. If no customer awaits service, then during this period 
no service is performed. (We can imagine, for example, a taxi stand at 
which a cab arrives at fixed time intervals to give service. If no one is pre- 
sent, the cab immediately departs.) During a service period new customers 
may arrive. We suppose that the actual number of customers that arrive 
during the nth period is a random variable €, whose distribution is inde- 
pendent of the period and is given by 


Pr{k customers arrive in a service period} = Pr{é, = k} = a,, 


fork = 0,1,..., where a, = 0 and >, a, = 1. 

We also assume that &,, &, ... are independent random variables. The 
state of the system at the start of each period is defined to be the number 
of customers waiting in line for service. If the present state is 7, then after 
the lapse of one period the state is 


| its if i= 1, 
J—- 


g if i = 0, (3.5) 


where € is the number of new customers having arrived in this period 
while a single customer was served. In terms of the random variables of 
the process we can express (3.5) formally as 


Xp+| — (X,, ~ 1)* + és 


where Y* = max{Y, 0}. In view of (3.5), the transition probability matrix 
may be calculated easily, and we obtain 
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4 a4 G4 a AQ, 
G4 4 4 @ A 


0 a a a a, 


It is intuitively clear that if the expected number of new customers, 
>i-o ka,, that arrive during a service period exceeds one, then with the pas- 
sage of time the length of the waiting line increases without limit. On the 
other hand, if >, ka, < 1, then the length of the waiting line approaches 
a statistical equilibrium that is described by a limiting distribution 


lim Pr{X, =MX,=j}=m>0, fork=0,1,..., 


where >,,-) 7™ = 1. Important quantities to be determined by this model 
include the long run fraction of time that the service facility is idle, given 
by 7, and the long run mean time that a customer spends in the system, 
given by >) (1 + k)m. 


Exercises 


3.1. Consider a spare parts inventory model in which either 0, 1, or 2 re- 
pair parts are demanded in any period, with 


Pr{é, = 0} = 0.4, Pr{é, = 1} = 0.3, Pr{é, = 2} = 0.3, 


and suppose s = 0 and S = 3. Determine the transition probability matrix 
for the Markov chain {X,,}, where X,, is defined to be the quantity on hand 
at the end of period n. 


3.2. Consider two urns A and B containing a total of N balls. An exper- 
iment is performed in which a ball is selected at random (all selections 
equally likely) at time ¢ (¢ = 1, 2, ...) from among the totality of N balls. 
Then an urn is selected at random (A is chosen with probability p and B 
is chosen with probability g) and the ball previously drawn is placed in 
this urn. The state of the system at each trial is represented by the number 
of balls in A. Determine the transition matrix for this Markov chain. 
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3.3. Consider the inventory model of Section 3.1. Suppose that S = 3. 
Set up the corresponding transition probability matrix for the end-of- 
period inventory level X,,. 


3.4. Consider the inventory model of Section 3.1. Suppose that S = 3 
and that the probability distribution for demand is Pr{€é = 0} = 0.1, 
Pr{é = 1} = 0.4, Pr{é = 2} = 0.3, and Pr{é = 3} = 0.2. Set up the cor- 
responding transition probability matrix for the end-of-period inventory 
level X,,. 


3.5. An ur initially contains a single red ball and a single green ball. A 
ball is drawn at random, removed, and replaced by a ball of the opposite 
color, and this process repeats so that there are always exactly two balls in 
the urn. Let X, be the number of red balls in the urn after n draws, with 
X, = 1. Specify the transition probabilities for the Markov chain {X,}. 


Problems 


3.1. An urn contains six tags, of which three are red and three green. 
Two tags are selected from the urn. If one tag is red and the other is green, 
then the selected tags are discarded and two blue tags are returned to the 
urn. Otherwise, the selected tags are returned to the urn. This process re- 
peats until the urn contains only blue tags. Let X,, denote the number of red 
tags in the urn after the nth draw, with X, = 3. (This is an elementary 
model of a chemical reaction in which red and green atoms combine to 
form a blue molecule.) Give the transition probability matrix. 


3.2. Three fair coins are tossed, and we let X, denote the number of 
heads that appear. Those coins that were heads on the first trial (there were 
X, of them) we pick up and toss again, and now we let X, be the total num- 
ber of tails, including those left from the first toss. We toss again all coins 
showing tails, and let X, be the resulting total number of heads, including 
those left from the previous toss. We continue the process. The pattern 1s, 
Count heads, toss heads, count tails, toss tails, count heads, toss heads, 
etc., and X, = 3. Then {X,} is a Markov chain. What is the transition prob- 
ability matrix? 
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3.3. Consider the inventory model of Section 3.1. Suppose that unful- 
filled demand is not back ordered but is lost. 


(a) Set up the corresponding transition probability matrix for the end- 
of-period inventory level X,,. 

(b) Express the long run fraction of lost demand in terms of the de- 
mand distribution and limiting probabilities for the end-of-period 
inventory. — 


3.4. Consider the queueing model of Section 3.4. Now suppose that at 
most a single customer arrives during a single period, but that the service 
time of a customer is a random variable Z with the geometric probability 
distribution 


Pr{Z = k} = a(1 — a)! fork =1,2,.... 


Specify the transition probabilities for the Markov chain whose state is the 
number of customers waiting for service or being served at the start of 
each period. Assume that the probability that a customer arrives in a pe- 
riod is 8 and that no customer arrives with probability 1 — B. 


3.5. You are going to successively flip a quarter until the pattern HHT 
appears; that is, until you observe two successive heads followed by a 
tails. In order to calculate some properties of this game, you set up a 
Markov chain with the following states: 0, H, HH, and HHT, where 0 rep- 
resents the starting point, H represents a single observed head on the last 
flip, HH represents two successive heads on the last two flips, and HHT is 
the sequence that you are looking for. Observe that if you have just tossed 
a tails, followed by a heads, a next toss of a tails effectively starts you over 
again in your quest for the HHT sequence. Set up the transition probabil- 
ity matrix. 


3.6. Two teams, A and B, are to play a best of seven series of games. 
Suppose that the outcomes of successive games are independent, and each 
is won by A with probability p and won by B with probability 1 — p. Let 
the state of the system be represented by the pair (a, b), where a is the 
number of games won by A, and b is the number of games won by B. 
Specify the transition probability matrix. Note that a + b = 7 and that the 
series ends whenever a = 4 or b = 4. 
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3.7. Acomponent in a system is placed into service, where it operates 
until its failure, whereupon it is replaced at the end of the period with a 
new component having statistically identical properties, and the process 
repeats. The probability that a component lasts for k periods is a@,, for 
k = 1,2,....Let X, be the remaining life of the component in service at 
the end of period n. Then X,, = 0 means that X,,,, will be the total operat- 
ing life of the next component. Give the transition probabilities for the 
Markov chain {X,,}. 


3.8. Two ums A and B contain a total of N balls. Assume that at time tf 
there were exactly k balls in A. At time ¢ + 1 an urn is selected at random 
in proportion to its contents (1.e., A is chosen with probability k/N and B 
is chosen with probability (VN — k)/N). Then a ball is selected from A with 
probability p or from B with probability g and placed in the previously 
chosen urn. Determine the transition matrix for this Markov chain. 


3.9. Suppose that two urns A and B contain a total of N balls. Assume 
that at time ¢ there are exactly k balls in A. At time ¢ + 1 a ball and an urn 
are chosen with probability depending on the contents of the urn (1.e., a 
ball is chosen from A with probability k/N or from B with probability 
(N — k)/N). Then the ball is placed into one of the urns, where urn A is 
chosen with probability k/N or urn B is chosen with probability (N — k)/N. 
Determine the transition matrix of the Markov chain with states repre- 
sented by the contents of A. 


3.10. Consider a discrete-time, periodic review inventory model and let 
€, be the total demand in period n, and let X,, be the inventory quantity on 
hand at the end-of-period n. An (s, S) inventory policy is used: If the end- 
of-period stock is not greater than s, then a quantity is instantly procured 
to bring the level up to S. If the end-of-period stock exceeds s, then no re- 
plenishment takes place. 


(a) Suppose that s = 1, S = 4, and X, = S = 4. If the period demands 
turn out to be €, = 2, €, = 3, &£ = 4, &, = 0, & = 2, & = 1, 
€& = 2, & = 2, what are the end-of-period stock levels X,, for 
periods n = 1,2,..., 8? 

(b) Suppose &, &, ...are independent random variables where 
Pr{é, = 0} = 0.1, Pr{é, = 1} = 0.3, Pr{é, = 2} = 0.3, Pr{é, = 3} = 
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0.2, and Pr{é, = 4} = 0.1. Then Xp, X,, . . . is a Markov chain. De- 
termine P,, and Py. 


4. First Step Analysis 


A surprising number of functionals on a Markov chain can be evaluated 
by a technique that we call first step analysis. This method proceeds by 
analyzing, or breaking down, the possibilities that can arise at the end of 
the first transition, and then invoking the law of total probability coupled 
with the Markov property to establish a characterizing relationship among 
the unknown variables. We first applied this technique in Theorem 2.1. In 
this section we develop a series of applications of the technique. 


4.1. Simple First Step Analyses 


Consider the Markov chain {X,,} whose transition probability matrix is 


0 1 2 
O}/1 O 0 
P=liia B yi, 
2110 0 1 


where a> 0, B>0, y>0,anda+ B+ y= 1. If the Markov chain be- 
gins in state 1, it remains there for a random duration and then proceeds ei- 
ther to state 0 or to state 2, where it is trapped or absorbed. That is, once in 
state 0 the process remains there for ever after, as it also does in state 2. Two 
questions arise: In which state, 0 or 2, is the process ultimately trapped, and 
how long, on the average, does it take to reach one of these states? Both 
questions are easily answered by instituting a first step analysis. 
We begin by more precisely defining the questions. Let 


T = min{n = 0; X, = Oor xX, = 2} 


be the time of absorption of the process. In terms of this random absorp- 
tion time, the two questions ask us to find 
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u = Pr{X, = 0X, = 1} 
and 
v = E[T|X, = 1). 


We proceed to institute a first step analysis, considering separately the 
three contingencies X, = 0, X, = 1, and X, = 2, with respective probabil- 
ities a, B, and y. Consider u = Pr{X; = OX, = 1}. If X, = 0, which oc- 
curs with probability a, then T = 1 and X, = 0. If X, = 2, which occurs 
with probability y, then again T = 1, but X; = 2. Finally, if X, = 1, which 
occurs with probability £, then the process returns to state 1 and the prob- 
lem repeats from the same state as before. In symbols, we claim that 


Pr{X, = 0|X, = 0} = 1, 

Pr{X, = 0|X, = 2} = 0, 

Pr{X, = 0X, = 1} =u, 
which inserted into the law of total probability gives 


u = Pr{X, = 0|X, = 1} 


2 
= >) Pr{X; = 0X, = 1, X, = k} Pr{X, = AX, = 1} 


k=0 
2 
= >) Pr{X; = 0|X, = k} Pr{X, = KX, = 
a0 (by the Markov property) 
= I(a) + u(B) + 0(y). 
Thus we obtain the equation 
u=art Bu, (4.1) 
which gives 
a a 
u= = , 
1-B aty 


Observe that this quantity is the conditional probability of a transition to 
0, given that a transition to 0 or 2 occurred. That is, the answer makes 
sense. 
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We turn to determining the mean time to absorption, again analyzing 
the possibilities arising on the first step. The absorption time T is always 
at least 1. If either X, = O or X, = 2, then no further steps are required. If, 
on the other hand, X, = 1, then the process 1s back at its starting point, and 
on the average, v = E[T|X, = 1] additional steps are required for absorp- 
tion. Weighting these contingencies by their respective probabilities, we 
obtain for v = E[T|X, = 1], 

v= 1+ a(O) + BY) + y(0) 
(4.2) 
= 1+ py, 


which gives 
] 
y = —.. 
1—p 
In the example just studied, the reader is invited to verify that T has the 
geometric distribution in which 


Pr{T>kX,=1}=B  fork=0,1,..., 


and therefore 


E(T|X, = 1) = >. Pr{T > kX, = 1} = a 
k=0 1— fp 


That is, a direct calculation verifies the result of the first step analysis. Un- 
fortunately, in more general Markov chains a direct calculation is rarely 
possible, and first step analysis provides the only solution technique. 

A significant extension occurs when we move up to the four-state 
Markov chain whose transition probability matrix is 

0 ] 2 3 

0 ] 0 0 O 
1} Po Pu Pa Pa 

21) Po Py Py Ps 

3 0 0 0 ] 


P 


Absorption now occurs in states 0 and 3, and states 1 and 2 are “transient.” 
The probability of ultimate absorption in state 0, say, now depends on the 
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transient state in which the process began. Accordingly, we must extend 
our notation to include the starting state. Let 


T = min{n = 0; X, = 0 or X, = 3}, 
u, = Pr{X 7 = 0X fori = 1, 2, 
and 
v,=E(T\X,= i] fori = 1,2. 


We may extend the definitions for u; and v; in a consistent and common- 
sense manner by prescribing uy = 1, u, = 0, and vy, = v, = 0. 

The first step analysis now requires us to consider the two possible 
starting states X, = 1 and X, = 2 separately. Considering X) = 1 and ap- 
plying a first step analysis to u, = Pr{X,; = O1X, = 1}, we obtain 

Uy = Po + Pyu, + Pith. (4.3) 
The three terms on the right correspond to the contingencies X, = 0, 
X, = 1, and X, = 2, respectively, with the conditional probabilities 
Pr{X, = O1X, = 0} = 1, 
Pr{X, = 0|X, = 1} =u, 
and 
X, = 0X, = 2} = 


The law of total probability then applies to give (4.3), just as it was used 
in obtaining (4.1). A similar equation is obtained for u,: 


U, = Py + Pu, + Prt. (4.4) 

The two equations in u, and u, are now solved simultaneously. To give a 
numerical example, we will suppose 

0 l 2 3 

0 ) 0 0 

17,04 03 02 0.1 

2/101 03 03 03 

3 0 0 ) 


(4.5) 
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The first step analysis equations (4.3) and (4.4) for u, and u, are 
u, = 0.4 + 0.3u, + 0.2u,, 


U5 = 0.1 + 0.3u, + 0.3u,, 
Or 
0.7u, a 0.2u, = 0.4, 


~0.3u, + 0.7u, = 0.1. 


The solution is u, = #% and u, = 2. Note that one cannot, in general, solve 
for u, without bringing in u,, and vice versa. The result u, = } tells us that 
once begun in state X, = 2, the Markov chain {X,,} described by (4.5) will 
ultimately end up in state 0 with probability u, = 45, and alternatively, will 
be absorbed in state 3 with probability 1 — u, = §. 

The mean time to absorption also depends on the starting state. The first 
step analysis equations for v; = E [T|X, = i] are 


vy = 1+ Pyy, + Provo, 
(4.6) 
vy, = 1+ Pv, + Povs. 


The right side of (4.6) asserts that at least one step is always taken. If the 
first move is to either X, = 1 or X, = 2, then additional steps are needed, 
and on the average, these are v, and v,, respectively. Weighting the con- 
tingencies X, = 1 and X, = 2 by their respective probabilities and sum- 
ming according to the law of total probability results in (4.6). 

For the transition matrix given in (4.5), the equations are 


v, = 1+ 0.3v, + 0.2Vv,, 
v,= 14+ 0.3v, + 0.3v,, 


and their solutions are v, = # and v, = 4¥. Again, v, cannot be obtained with- 
out also considering v,, and vice versa. For a process that begins in state X, 
= 2, on the average v, = 4% = 2.33 steps will transpire prior to absorption. 

To study the method in a more general context, let {X,,} be a finite-state 
Markov chain whose states are labeled 0, 1, .. . , N. Suppose that states 
0,1,...,7— 1 are transient’ in that P’? > 0asn > for0Si,j<r, 
while states r,..., N are absorbing (P; = 1 forr S i S N). The transition 
matrix has the form 


> The definition of a transient state is different for an infinite-state Markov chain. See IV, 
Section 3. 


4. First Step Analysis 121 
R 
-l0 al 47 


where 0 is an (N — r + 1) X r matrix all of whose entries are zero, I is an 
(N-—r+1)X(N-—r+ 1) identity matrix, and Q, = P, forO=ij <r. 

Started at one of the transient states X, = i, where 0 =i<- r, such 
a process will remain in the transient states for some random duration, 
but ultimately the process gets trapped in one of the absorbing states 
i = r,...,N. Functionals of importance are the mean duration until ab- 
sorption and the probability distribution over the states in which absorp- 
tion takes place. 

Let us consider the second question first and fix a state k among the ab- 
sorbing states (r = k = N). The probability of ultimate absorption in state 
k, as opposed to some other absorbing state, depends on the initial state 
X, = i. Let U,, = u; denote this probability, where we suppress the target 
state k in the notation for typographical convenience. 

We begin a first step analysis by enumerating the possibilities in the 
first transition. Starting from state i, then with probability P, the process 
immediately goes to state k, thereafter to remain, and this is the first pos- 
sibility considered. Alternatively, the process could move on its first step 
to an absorbing state j # k, where r = j = N, in which case ultimate ab- 
sorption in state k is precluded. Finally, the process could move to a tran- 
sient state j < r. Because of the Markov property, once in state j, then the 
probability of ultimate absorption in state k is u, = U, by definition. 
Weighting the enumerated possibilities by their respective probabilities 
via the law of total probability, we obtain the relation 


u; = Pr{ Absorption in k|X, = i} 


N 
= 2, Pr{ Absorption i in kX, = i, X, = J }P; 


r-1 
O+ > Pu 
j=0 


. 
Bt 


To summarize, for a fixed absorbing state k, the quantities 


u; = U,, = Pr{Absorption in k\X, = i} forO Si<r 
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satisfy the inhomogeneous system of linear equations 
r—] 


U,=P,+ > PU, i=0,1,...,r-1. (4.8) 
NL 


Example A Maze A white rat is put into the maze shown: 


In the absence of learning, one might hypothesize that the rat would move 
through the maze at random; 1.e., if there are k ways to leave a compart- 
ment, then the rat would choose each of these with probability 1/k. As- 
sume that the rat makes one change to some adjacent compartment at each 
unit of time and let X,, denote the compartment occupied at stage n. We 
suppose that compartment 7 contains food and compartment 8 contains an 
electrical shocking mechanism, and we ask the probablity that the rat, 
moving at random, encounters the food before being shocked. The appro- 
priate transition probability matrix is 


01234567 8 


loi 
Lol 


ad [ome ad | me 
ome 


Wl 
CS Jenne 


an) 
II 
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Let u, = u,7) denote the probability of absorption in the food compart- 
ment 7, given that the rat is dropped initially in compartment i. Then equa- 
tions (4.8) become, in this particular instance, 


Uy = 3U, + 3, 

U, = 5+ 3Uo + 3Us, 

u, = 3Uo + 3U3, 

u; = gu, + Uy + ju, + Gus, 

U, = 3 + 3U; + 3Ue, 
us = 3U; + 3s, 
Ug = 3Uy + 3Us. 


Turning to the solution, we see that the symmetry of the maze implies that 
Uy = Us, U. = Us, and u, = Uy. We also must have u, = 3. With these sim- 
plifications the equations for up, u,, and u, become 


_ 1 1 
Uy = 3U, + 5U, 
— ! + 1 
u, = > 3Ug, 
_ 1 1 
U, = 6 + 3Uo, 


and the natural substitutions give u, = 4(5 + 4U) + 3( + 4uo), OF Uy = 5, 
u, = 3,andu, =}. 

One might compare these theoretical values under random moves with 
actual observations as an indication of whether or not learning is taking 
place. 


4.2. The General Absorbing Markov Chain 


Let {X,,} be a Markov chain whose transition probability matrix takes the 
form (4.7). We turn to a more general form of the first question by intro- 
ducing the random absorption time 7. Formally, we define 


T = min{n = 0; X, = r}. 
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Let us suppose that associated with each transient state i is a rate g(i) and 
that we wish to determine the mean total rate that is accumulated up to ab- 
sorption. Let w, be this mean total amount, where the subscript i denotes 
the starting position X, = i. To be precise, let 


T-1 
w, = E| >, 9(X,)|Xo = i 
n=0 


The choice g(i) = 1 for all i yields >7-' g(X,) = >72) 1 = T, and then 
w,; is identical to v, = E(T|X, = i], the mean time until absorption. For a 
transient state k, the choice 


_ fl ifi=k 
a = |p ifitXk, 


gives w, = W,, the mean number of visits to state k (0 = k <r) prior to 
absorption. 

We again proceed via a first step analysis. The sum > 72} 2(X,) always 
includes the first term g(X,) = g(i). In addition, if a transition is made 
from i to a transient state j, then the sum includes future terms as well. By 
invoking the Markov property we deduce that this future sum proceeding 
from state 7 has an expected value equal to w,. Weighting this by the tran- 
sition probability P; and then summing all contributions in accordance 
with the law of total probability, we obtain the joint relations 


r—] 
w,= gi) +> Pw, fori=0,...,7—1. (4.9) 
j=0 


The special case in which g(i) = 1 for all i determines v, = E [T}X, = || 
as solving 


vyi=1+)> Py fori=0,1,...,r—-1. (4.10) 


The case in which 


vag eft ifizk 
g(t) = 0= 10 fit k, 


4. First Step Analysis 125 


determines W,,, the mean number of visits to state k prior to absorption 
starting from state i, as solving 


W,=5,+ 5 PW, fori=0,1,...,r-1. (41D 


Example A Model of Fecundity Changes in sociological patterns 
such as increase in age at marriage, more remarriages after widowhood, 
and increased divorce rates have profound effects on overall population 
growth rates. Here we attempt to model the life span of a female in a pop- 
ulation in order to provide a framework for analyzing the effect of social 
changes on average fecundity. 

The general model we propose has a large number of states delimiting 
the age and status of a typical female in the population. For example, we 
begin with the twelve age groups 0-4 years, 5—9 years, ... , 50-54 years, 
55 years and over. In addition, each of these age groups might be further 
subdivided according to marital status: single, married, separated, di- 
vorced, or widowed, and might also be subdivided according to the num- 
ber of children. Each female would begin in the (O—4, single) category and 
end in a distinguished state A corresponding to death or emigration from 
the population. However, the duration spent in the various other states 
might differ among different females. Of interest is the mean duration 
spent in the categories of maximum fertility, or more generally a mean 
sum of durations weighted by appropriate fecundity rates. 

When there are a large number of states in the model, as just sketched, 
the relevant calculations require a computer. We turn to a simpler model 
which, while less realistic, will serve to illustrate the concepts and ap- 
proach. We introduce the states 


E,: Prepuberty, E,: Divorced, 
E;: Single, E,: Widowed, 
E,: Married, E: A, 


and we are interested in the mean duration spent in state E,: Married, since 
this corresponds to the state of maximum fecundity. To illustrate the com- 
putations, we will suppose the transition probability matrix is 
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E E, E, &E, &£E, E; 


E||0 09 O O O 01 
E}0 05 04 0 O 0.1 
p = E,\| 0 0 06 02 0.1 O.1 
E;}| 0 004 05 #O 0.1 
E,|| 0 004 0 05 0.1 
E\| 0 0 0 O 9O 1.0 


In practice, such a matrix would be estimated from demographic data. 

Every person begins in state E, and ends in state E,, but a variety of in- 
tervening states may be visited. We wish to determine the mean duration 
spent in state E,: Married. The powerful approach of first step analysis be- 
gins by considering the slightly more general problem in which the initial 
state is varied. Let w; = W,, be the mean duration in state E, given the ini- 
tial state X, = E; fori = 0, 1,...,5. We are interested in wp, the mean du- 
ration corresponding to the initial state Ep. 

First step analysis breaks down, or analyzes, the possibilities arising in 
the first transition, and using the Markov property, an equation that relates 
Wo, . ++ 5 Ws results. 

We begin by considering w>. From state E, a transition to one of the 
states E, or E, occurs, and the mean duration spent in E, starting from E, 
must be the appropriately weighted average of w, and w,. That is, 


Wy = 0.9w, + 0.1. 
Proceeding in a similar manner, we obtain 
w, = 0.5w, + 0.4w, + 0.1Ws. 


The situation changes when the process begins in state E,, because in 
counting the mean duration spent in E,, we must count this initial visit 
plus any subsequent visits that may occur. Thus for E, we have 


w, = 1 + 0.6w, + 0.2w, + 0.1w, + 0.1w,. 
The other states give us 
w, = 0.4w, + 0.5w, + 0.1w,, 
w, = 0.4w, + 0.5w, + 0.1w,, 


Ws = Ws. 
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Since state E, corresponds to death, it is clear that we must have w, = 0. 
With this prescription, the reduced equations become, after elementary 
simplification, 


—1.0w, + 0.9w, = 0, 

—0.5w, + 0.4w, = Q, 
—0.4w, + 0.2w, + O.1lw, = —-1, 

0.4w, — 0.5w, = Q, 

0.4w, — 0.5w, = 0. 


The unique solution is 
Wy = 4.5, w, = 5.00, w, = 6.25, w, = w, = 5.00. 


Each female, on the average, spends wy = W,, = 4.5 periods in the child- 
bearing state E, during her lifetime. 


Exercises 


4.1. Find the mean time to reach state 3 starting from state 0 for the 
Markov chain whose transition probability matrix is 
0 J 2 3 
O;}04 03 02 O.1 
_ 1 0 07 O02 O01 
—2i 0 0 09 (On 
3 ) ) 0 J 


P 


4.2. Consider the Markov chain whose transition probablity matrix is 
given by 
0 J 2 
0 J 0 O 
P=1//01 06 0.3 
2 O 0 J 
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(a) Starting in state 1, determine the probability that the Markov chain 
ends in state 0. 
(b) Determine the mean time to absorption. 


4.3. Consider the Markov chain whose transition probability matrix is 
given by 
0 ] 2 3 
0 ] 0 0 0 
101 06 O11 0.2 
—2if0o2 03 04 O1 
3 0 0 0 J 


(a) Starting in state 1, determine the probability that the Markov chain 
ends in state 0. 
(b) Determine the mean time to absorption. 


4.4. Acoin is tossed repeatedly until two successive heads appear. Find 
the mean number of tosses required. 


Hint: Let X, be the cumulative number of successive heads. The state 
space is Q, 1, 2 and the transition probability matrix is 


0 1 2 
OL 4 0 
P=1|[/! 0 } 
210 0 1 


Determine the mean time to reach state 2 starting from state 0 by invok- 
ing a first step analysis. 


4.5. Acoin is tossed repeatedly until either two successive heads appear 
or two successive tails appear. Suppose the first coin toss results in a head. 
Find the probability that the game ends with two successive tails. 
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4.6. Consider the Markov chain whose transition probability matrix is 
given by 


0 1 2 3 
oll 1 0 0 0 
1/01 04 01 04 
~~ 2ilo2 01 06 O.1 
311 0 0 0 1 


P 


(a) Starting in state 1, determine the probability that the Markov chain 
ends in state 0. 
(b) Determine the mean time to absorption. 


4.7. Consider the Markov chain whose transition probability matrix is 
given by 
0 ] 2 3 
0 ] 0 0 0 
101 02 O5 0.2 
—-2i101 02 06 0.1 
3 0 0 0 ] 


Starting in state 1, determine the mean time that the process spends in 
state 1 prior to absorption and the mean time that the process spends in 
state 2 prior to absorption. Verify that the sum of these is the mean time 
to absorption. 


4.8. Consider the Markov chain whose transition probability matrix is 
given by 
0 ] 2 3 
0 ] 0 0 0 
1705 02 OL 0.2 
2/102 O01 06 0.1 
3 0 0 0 ] 
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Starting in state 1, determine the mean time that the process spends in 
state 1 prior to absorption and the mean time that the process spends in 
state 2 prior to absorption. Verify that the sum of these is the mean time 
to absorption. 


4.9. Consider the Markov chain whose transition probability matrix is 
given by 


0 1 2 3 
Ol 1 0 0 0 
—iffo1 02 05 02 
~—62ilo1 02 06 O1 
31 0 0 0 1 


Starting in state 1, determine the probability that the process is absorbed 
into state 0. Compare this with the (1,0)th entry in the matrix powers P’, 
P*, P®, and P*. 


Problems 


4.1. Which will take fewer flips, on average: successively flipping a 
quarter until the pattern HHT appears, that is, until you observe two suc- 
cessive heads followed by a tails; or successively flipping a quarter until 
the pattern HTH appears? Can you explain why these are different? 


4.2. A zero-seeking device operates as follows: If it is in state m at time 
n, then at time n + 1, its position is uniformly distributed over the states 
0, 1,...,m— 1. Find the expected time until the device first hits zero 
starting from state m. 


Note: This is a highly simplified model for an algorithm that seeks a 
maximum over a finite set of points. 


4.3. A zero-seeking device operates as follows: If it is in state j at time 
n, then at time n + 1, its position is O with probability 1/7, and its position 
is k (where k is one of the states 1, 2,..., 7 — 1) with probability 2k/j’. 
Find the expected time until the device first hits zero starting from state m. 
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4.4. Consider the Markov chain whose transition probability matrix is 
given by 


0 1 2 3 
Ol 1 0 0 0 
—iffo1 02 05 02 
~—62ilo1 02 06 O01 
31102 02 03 03 


Starting in state X, = 1, determine the probability that the process never 
visits state 2. Justify your answer. 


4.5. A white rat is put into compartment 4 of the maze shown here: 


He moves through the compartments at random; i.e., if there are k ways to 
leave a compartment, he chooses each of these with probability 1/k. What 
is the probability that the rat finds the food in compartment 3 before feel- 
ing the electric shock in compartment 7? 


4.6. Consider the Markov chain whose transition matrix is 


0 1 2 3 4 


r WY NY | © 
Oo QQ QQ RQ 
oo ocolUlcOlUU 
oo EE > = 
oo UVU CO CO 
=~ TS CO DGD O&O 
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where p + q = 1. Determine the mean time to reach state 4 starting from 
state 0. That is, find E[T|X, = 0] where JT = min{n = 0; X,, = 4}. 


Hint: Let v, = E[T|X, = i] fori = 0, 1,..., 4. Establish equations 
for Vo, V|,..., Vs by using a first step analysis and the boundary condi- 
tion v, = 0. Then solve for vp. 


4.7. Let X, be a Markov chain with transition probabilities P;. We are 
given a “discount factor” B with 0 < B < 1 and a cost function c(i), and 
we wish to determine the total expected discounted cost starting from state 
i, defined by 


h, = E| >, B'c(X,)|Xo = i| 
n=0 


Using a first step analysis, show that h, satisfies the system of linear 
equations 


h, = c(i) + B > Ph, for all states 7. 
J 


4.8. Anurn contains five red and three green balls. The balls are chosen 
at random, one by one, from the urn. If a red ball 1s chosen, it 1s removed. 
Any green ball that is chosen is returned to the urn. The selection process 
continues until all of the red balls have been removed from the urn. What 
is the mean duration of the game? 


4.9. Anurn contains five red and three yellow balls. The balls are chosen 
at random, one by one, from the urn. Each ball removed is replaced in the 
urn by a yellow ball. The selection process continues until all of the red balls 
have been removed from the urn. What is the mean duration of the game? 


4.10. You have five fair coins. You toss them all so that they randomly 
fall heads or tails. Those that fall tails in the first toss you pick up and toss 
again. You toss again those that show tails after the second toss, and so on, 
until all show heads. Let X be the number of coins involved in the last 
toss. Find Pr{X = 1}. 


4.11. Anurn contains two red and two green balls. The balls are chosen 
at random, one by one, and removed from the urn. The selection process 
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continues until all of the green balls have been removed from the urn. 
What is the probability that a single red ball is in the urn at the time that 
the last green ball is chosen? 


4.12. A Markov chain X), X,, X,,...has the transition probability 
matrix 
0 l 2 
0}}0.3 0.2 0.5 
P=1//05 01 04 
2 0 0 l 
and is known to start in state X, = 0. Eventually, the process will end up 


in state 2. What is the probability that when the process moves into state 
2, it does so from state 1? 


Hint: Let T= min{n = 0; X, = 2}, and let 
z, = Pr{X;., = 1|X) =i} fori = 0, 1. 
Establish and solve the first step equations 
Zo = 0.3z) + 0.22z,, 
Zz, = 0.4 + 0.5z) + 0.12). 


4.13. A Markov chain X, X;, X2,...has the transition probability 
matrix 


0 I 2 
01/03 02 0.5 


P=1)05 O01 04 
2 O 0 I 


and is known to start in state X, = 0. Eventually, the process will end up 
in state 2. What is the probability that the time T = min{n = 0; X, = 2} 
is an odd number? 
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4.14. Asingle die is rolled repeatedly. The game stops the first time that 
the sum of two successive rolls is either 5 or 7. What is the probability that 
the game stops at a sum of 5? 


4.15. A simplified model for the spread of a rumor goes this way: There 
are N = 5 people in a group of friends, of which some have heard the 
rumor and the others have not. During any single period of time, two peo- 
ple are selected at random from the group and assumed to interact. The se- 
lection is such that an encounter between any pair of friends is just as 
likely as between any other pair. If one of these persons has heard the 
rumor and the other has not, then with probability a = 0.1 the rumor is 
transmitted. Let X, denote the number of friends who have heard the 
rumor at the end of the nth period. 

Assuming that the process begins at time 0 with a single person know- 
ing the rumor, what is the mean time that it takes for everyone to hear it? 


4.16. Anurn contains 5 tags, of which 3 are red and 2 are green. A tag 
is randomly selected from the urn and replaced with a tag of the opposite 
color. This continues until only tags of a single color remain in the urn. 
Let X,, denote the number of red tags in the urn after the nth draw, with 
X,) = 3. What is the probability that the game ends with the urn contain- 
ing only red balls? 


4.17. The damage X,, of a system subjected to wear is a Markov chain 
with the transition probability matrix 


0 1 2 

0}|0.7 0.3 0 

P= 1 0 06 0.4 

2 O O 1 
The system starts in state 0 and fails when it first reaches state 2. Let 
T = min{n = 0; X,, = 2} be the time of failure. Use a first step analysis to 


evaluate d(s) = E[s’] for a fixed number 0 < s < 1. (This is called the 
generating function of T. See Section 9.) 


4.18. Time-dependent transition probabilities. A  well-disciplined 
man, who smokes exactly one half of a cigar each day, buys a box con- 
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taining N cigars. He cuts a cigar in half, smokes half, and returns the other 
half to the box. In general, on a day in which his cigar box contains w 
whole cigars and h half cigars, he will pick one of the w + h smokes at 
random, each whole and half cigar being equally likely, and if it is a half 
cigar, he smokes it. If it 1s a whole cigar, he cuts it in half, smokes one 
piece, and returns the other to the box. What is the expected value of 7, 
the day on which the last whole cigar is selected from the box? 


Hint: Let X, be the number of whole cigars in the box after the nth 
smoke. Then X,, 1s a Markov chain whose transition probabilities vary 
with n. Define v,(w) = E [T|X, = w]. Use a first step analysis to develop 
a recursion for v,(w) and show that the solution is 


2Nw +n + 2w | 
v,(w) = ————_—— _ — » y 
k= 


wt | 
whence 
“1 
E(T] = v,(N) = 2N- >. —. 
i= k 
4.19. Computer Challenge. Let N be a positive integer and let 


Z,,..., Zy be independent random variables, each having the geometric 
distribution 


1\‘ 
priz= k} = (>), fork =1,2,.... 


Since these are discrete random variables, the maximum among them may 
be unique, or there may be ties for the maximum. Let p, be the probabil- 
ity that the maximum is unique. How does p, behave when N is large? (Al- 
ternative formulation: You toss N dimes. Those that are heads you set 
aside; those that are tails you toss again. You repeat this until all of the 
coins are heads. Then py is the probability that the last toss was of a sin- 
gle coin.) 


5. Some Special Markov Chains 


We introduce several particular Markov chains that arise in a variety of 
applications. 
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5.1. The Two-State Markov Chain 


Let 
0 ] 
=0|' 74 a 
P= 1-b 


where 0 < a,b < 1, (5.1) 


be the transition matrix of a two-state Markov chain. 

When a = 1 — JB, so that the rows of P are the same, then the states 
X,, X>,... are independent identically distributed random variables with 
Pr{X, = 0} = b and Pr{xX, = 1} = a. When a # 1 — J, the probability dis- 
tribution for X,, varies depending on the outcome X,_, at the previous stage. 

For the two-state Markov chain, it is readily verified by induction that 
the n-step transition matrix is given by 


(1 — a — b)’ 
at+b 


a—a 
—b bil 


= 


= 
at+bilba 


(5.2) 


To verify this general formula, introduce the abbreviations 


a —a 


—b b 


A= 


and B= | 


|’ a 

ba 

so that (5.2) can be written 
P" = (a + b)'[A + (1 — a — b)"BI. 


Next, check the multiplications 


lee 
1—b ba 


a-a-ab a@-atab 
—b+ab+b 


l-a 
lel =A 


and 


a —a 


BP =| 
—b b 


xf 8 


1—b 


,| = (1 — a — DB. 
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Now (5.2) is easily seen to be true when n = 1, since then 


a-—a 
—b b 


—|, 4 
at+biba 


a 
a+b 


1 pre a-ata+ab 
atbib—bt+ab+b atb-ab-D 
=| ° a 

b 1l-b 


lap 


To complete an induction proof, assume that the formula is true for n. 
Then 


P’P = (a+ b)'[A + (1 — a — b)'"B]P 
= (a + b)'[AP + (1 — a — b)"BP] 
=(a+b)'[A + (1 — a — b)"*'B] = P"*'. 


We have verified that the formula holds for n + 1. It therefore is estab- 
lished for all n. 

Note that |1 — a — b| < 1 when 0 <a, b < 1, and thus |1 — a — Dl’ 5 0 
as n — © and 


b a 
a+b at+b 
b a | 


lim P” = 


ax 


(5.3) 


atb atb 


This tells us that such a system, in the long run, will be in state 0 with 
probability b/(a + b) and in state 1 with probability a/(a + b), irrespec- 
tive of the initial state in which the system started. 

For a numerical example, suppose that the items produced by a certain 
worker are graded as defective or not, and that due to trends in raw mate- 
rial quality, whether or not a particular item is defective depends in part 
on whether or not the previous item was defective. Let X,, denote the qual- 
ity of the nth item with X, = 0 meaning “good” and X, = 1 meaning “de- 
fective.” Suppose that {X,} evolves as a Markov chain whose transition 
matrix 1S 
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0 | 
p = 0/09 001) 
110.12 088i 


Defective items would tend to appear in bunches in the output of such a 
system. 

In the long run, the probability that an item produced by this system is 
defective is given by a/(a + b) = 0.01/(0.01 + 0.12) = 0.077. 


5.2. Markov Chains Defined by 
Independent Random Variables 


Let € denote a discrete-valued random variable whose possible values are 
the nonnegative integers and where Pr{é = i} = a, 2 0, fori = 0, 1,..., 
and >, a; = 1. Let &, &,..., &,... represent independent observa- 
tions of &. 

We shall now describe three different Markov chains connected with 
the sequence é,, €,.... In each case the state space of the process is the 
set of nonnegative integers. 


Example Independent Random Variables Consider the process X,, 
n=0,1,2,..., defined by X, = &, (X) = & prescribed). Its Markov ma- 
trix has the form 


Ay a, a; 
Ay a, a, oe 
P= , (5.4) 


That all rows are identical plainly expresses the fact that the random vari- 
able X,,,, is independent of X,,. 


Example Successive Maxima The partial maxima of é,, &,,. . . define 
a second important Markov chain. Let 


6, = max{é,..., &,}, forn=1,2,..., 
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with 0, = 0. The process defined by X,, = 9, is readily seen to be a Markov 
chain, and the relation X,,, = max{X,, &,,,} allows the transition proba- 
bilities to be computed to be 


Ay @ @ a, 
0 A, @ a; 
Pp = 0 0 A, Qa, oe" ’ (5.5) 


where A, = ad) + ':: + a, fork = 0, 1,. 

Suppose é,, &, . . . represent successive bids on a certain asset that is of- 
fered for sale. Then X, = max{&,,..., &€,} is the maximum that is bid up 
to stage n. Suppose that the bid that is accepted is the first bid that equals 
or exceeds a prescribed level M. The time of sale is the random variable 
T = min{n 2 1; X, = M}. A first step analysis shows that the mean 
pu = E[T] satisfies 


w=1+ pPr{é < M}, (5.6) 


or w = 1/Pr{é, = M} = 1Kay + ay.; + +++). The first step analysis in- 
voked in establishing (5.6) considers the two possibilities {€,; < M} and 
{€, = M}. With this breakdown, the law of total probability justifies the 
sum 


E{T] = E[T\é, = M) Pr{é, = M} + E[T|é, < M) Pr{€, <M}. (5.7) 


Clearly, E[T|é, = M] = 1, since no further bids are examined in this case. 
On the other hand, when €, < M we have the first bid, which was not ac- 
cepted, plus some future bids. The future bids &,, &, . . . have the same prob- 
abilistic properties as in the original problem, and they are examined until 
the first acceptable bid appears. This reasoning leads to E(T\é < M] = 
1 + yw. Substitution into (5.7) then yields (5.6) as follows: 


E[T] = 1 X Pr{é, = M} + (1 + pw) Pr{é, < M} 
= 1+ wPr{é < M}. 


To restate the argument somewhat differently, one always examines the 
first bid €,. If &, < M, then further bids are examined in a future that is 
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probabilistically similar to the original problem. That is, when &, < M, 
then on the average yw bids in addition to €, must be examined before an 
acceptable bid appears. Equation (5.6) results. 


Example Partial Sums Another important Markov chain arises from 
consideration of the successive partial sums 7, of the &,, i.e., 
Mm=HE& te + &, n=1,2,..., 


and by definition, 7 = 0. The process X, = 7, is readily seen to be a 
Markov chain via 


Pr{X,., =j\X, =i,...,X,-) =i,-,X%, =H 
=Prf{é.,=j-i& =i,6=i,-i,...,€ =i-i,4} 
= Pr{é,,, = j — i} (independence of &,, &,:.. .) 
= Pr{X,., = j|X, = i}. 


The transition probability matrix is determined by 


Pr{Xys = |X, = i} = Pr{é, tee t+ E+ = jlé tees t+ é, = i} 
= Pr{&,+1 = J _ i} 
_ (* for j = i, 
10 forj <i, 


where we have used the independence of the &.. 
Schematically, we have 


G4 G4, a, a, '': 


Oa a a::: : 
P= , (5.8) 
0 0a a::: 


If the possible values of the random variable € are permitted to be the 
positive and negative integers, then the possible values of 7, for each n 
will be contained among the totality of all integers. Instead of labeling the 
states conventionally by means of the nonnegative integers, it is more con- 
venient to identify the state space with the totality of integers, since the 
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transition probability matrix will then appear in a more symmetric form. 
The state space consists then of the values... —2, —1,0,1,2,... . 
The transition probability matrix becomes 


where Pr{é = k} = a, fork = 0, +1, +2,..., anda, = 0, d#2_. a, = 1. 


5.3. One-Dimensional Random Walks 


When we discuss random walks, it is an aid to intuition to speak about the 
state of the system as the position of a moving “particle.” 

A one-dimensional random walk is a Markov chain whose state space 
is a finite or infinite subset a,a + 1,..., b of the integers, in which the 
particle, if it is in state i, can in a single transition either stay in i or move 
to one of the neighboring states i — 1, i + 1. If the state space is taken as 
the nonnegative integers, the transition matrix of a random walk has the 
form 


0 1 2 i— 1 I i+ 1 
O}l ro Po 
1} q, r Pi 


2 || 0 q2 r, 


(5.9) 


where p, > 0, g; > 0, r, = 0, andg, + 7,+ p, = 1,1 = 1,2,...@=2 1), 
Po = 0, ry = 0, ro + po = 1. Specifically, if X, = i, then for i = 1, 


142 ill Markov Chains: Introduction 


Pr{X,,5 = l + 1|X, = i} = Pi» 
Pr{X,,+ = l ~ 1X, = i} = qi» 
Pr{X,,4 = i|X, = i} = Yi, 


with the obvious modifications holding for i = 0. 

The designation “random walk” seems apt, since a realization of the 
process describes the path of a person (suitably intoxicated) moving ran- 
domly one step forward or backward. 

The fortune of a player engaged in a series of contests is often depicted 
by a random walk process. Specifically, suppose an individual (player A) 
with fortune k plays a game against an infinitely rich adversary and has 
probability p, of winning one unit and probability g, = 1 — p, (k = 1) of 
losing one unit in the next contest (the choice of the contest at each stage 
may depend on his fortune), and r, = 1. The process X,, where X,, repre- 
sents his fortune after n contests, is clearly a random walk. Note that once 
the state 0 is reached (i.e., player A is wiped out), the process remains in 
that state. The event of reaching state k = 0 is commonly known as the 
“‘gambler’s ruin.” 

If the adversary, player B, also starts with a limited fortune / and player 
A has an initial fortune k (k + 1 = N), then we may again consider the 
Markov chain process X,, representing player A’s fortune. However, the 
states of the process are now restricted to the values 0, 1,2,..., N. At any 
trial, N — X, is interpreted as player B’s fortune. If we allow the possibil- 
ity of neither player winning in a contest, the transition probability matrix 
takes the form 


0 ] 2 3 N 
Ol] 1 0 0 
Ll r P\ 
2 || 0 r> tee 
p = BoP - 6.10) 
Gn-1 y-1 Pn-1 
Nilo. ..-- -. 0 0 | 


Again p, (q;), i = 1, 2,...,N — 1, denotes the probability of player A’s 
fortune increasing (decreasing) by 1 at the subsequent trial when his 
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present fortune is i, and r; may be interpreted as the probability of a draw. 
Note that, in accordance with the Markov chain given in (5.10), when 
player A’s fortune (the state of the process) reaches 0 or N it remains in 
this same state forever. We say player A is ruined when the state of the 
process reaches 0, and player B is ruined when the state of the process 
reaches N. 

The probability of gambler’s ruin (for player A) is derived in the next 
section by solving a first step analysis. Some more complex functionals on 
random walk processes are also derived there. 

The random walk corresponding to p, = p, gq, = 1 — p = q for all 
k = 1 and ~% = 1 describes the situation of identical contests. There is a 
definite advantage to player A in each individual trial if p > g, and con- 
versely, an advantage to player B if p < q. A “fair” contest corresponds to 
p = q = 3. Suppose the total of both players’ fortunes is N. Then the cor- 
responding walk, where X, is player A’s fortune at stage n, has the transi- 
tion probability matrix 


0 1 2 3 N-1N 

0 1 0 0 0 0 0 

q 0 p 0 0 0 

2 0 0 vee 0 0 
P= at 

N-1})0 0 0 0 0 p 

N 0 0 0 O --: 0 l 


Let u; = Ui, be the probability of gambler’s ruin starting with the initial 
fortune 7. Then u; is the probability that the random walk reaches state 0 
before reaching state N, starting from X, = i. The first step analysis of Sec- 
tion 4, as used in deriving equation (4.8), shows that these ruin probabil- 
ities satisfy 


u; = pu;., + qu;_, fori=1,...,N-1 (5.12) 
together with the obvious boundary conditions 
up= 1 and u=Q0. 


These equations are solved in the next section following a straight- 
forward but arduous method. There it is shown that the gambler’s ruin 
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probabilities corresponding to the transition probability matrix given in 
(5.11) are 


u; = Pr{X,, reaches state 0 before state NIX, = i} 


N-i 
Ny when p = qg = }, 
= . (5.13) 
/ i / N 
(qlpy — (qipy 7 Y coe ) when p # q. 


The ruin probabilities u; given by (5.13) have the following interpreta- 
tion. In a game in which player A begins with an initial fortune of i units 
and player B begins with N — i units, the probability that player A loses 
all his money before player B goes broke is given by u,, where p is the 
probability that player A wins in a single contest. If player B is infinitely 
rich (N —- ©), then passing to the limit in (5.13) and using (q/p)* > © as 
N > © if p < q, while (g/p)* — 0 if p > q, we see that the ruin probabil- 
ities become 


l if p = q, 
Pp 


(In passing to the limit, the case p = g = 3 must be treated separately.) We 
see that ruin is certain (u; = 1) against an infinitely rich adversary when 
the game is unfavorable (p < q), and even when the game is fair (p = q). 
In a favorable game (p > q), starting with initial fortune i, then ruin oc- 
curs (player A goes broke) with probability (g/p)'. This ruin probability de- 
creases as the initial fortune / increases. In a favorable game against an in- 
finitely rich opponent, with probability 1 — (g/p)' player A’s fortune 
increases, in the long run, without limit. 

More complex gambler’s-ruin-type problems find practical relevance in 
certain models describing the fluctuation of insurance company assets 
Over time. 

Random walks are not only useful in simulating situations of gambling 
but frequently serve as reasonable discrete approximations to physical 
processes describing the motion of diffusing particles. If a particle is sub- 
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jected to collisions and random impulses, then its position fluctuates ran- 
domly, although the particle describes a continuous path. If the future po- 
sition (i.e., its probability distribution) of the particle depends only on the 
present position, then the process X,, where X, is the position at time f, is 
Markov. A discrete approximation to such a continuous motion corre- 
sponds to a random walk. A classical discrete version of Brownian motion 
(VIII) is provided by the symmetric random walk. By a symmetric ran- 
dom walk on the integers (say all the integers) we mean a Markov chain 
with state space the totality of all integers and whose transition probabil- 
ity matrix has the elements 


Pp ifj=it 1, 
ifj=i-—1, 
i = P nd iij=0,1,2,..., 
r if j = 1, 
0 otherwise, 


where p > 0, r 2 0, and 2p + r = 1. Conventionally, “simple random 
walk” refers only to the case r = 0, p = 3. 

The classical simple random walk in n dimensions admits the follow- 
ing formulation. The state space is identified with the set of all integral 


lattice points in E” (Euclidean n space); that is, a state is an n-tuple k = 


(ki, k,,...,k,) of integers. The transition probability matrix is defined by 
I it} -k =1, 
P,, = 2n i=0 
0 otherwise. 


Analogous to the one-dimensional case, the simple random walk in E" 
represents a discrete version of n-dimensional Brownian motion. 


5.4. Success Runs 


Consider a Markov chain on the nonnegative integers with transition 
probability matrix of the form 
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0 1 2 3 4 
OllP, dG OO O O 
lip nq O 0 
P=2!ip 0 rn q@ OD ; (5.15) 
3 


where g; > 0, p; > 0, and p; + g, +r, = 1 fori = 0, 1, 2,.... The zero 
state plays a distinguished role in that it can be reached in one transition 
from any other state, while state i + 1 can be reached only from state i. 

This example arises surprisingly often in applications and at the same 
time is very easy to compute with. We will frequently illustrate concepts 
and results in terms of it. 

A special case of this transition matrix arises when one is dealing with 
success runs resulting from repeated trials each of which admits two pos- 
sible outcomes, success S or failure F. More explicitly, consider a se- 
quence of trials with two possible outcomes S or F. Moreover, suppose 
that in each trial, the probability of S is @ and the probability of F is 
B= 1 —- a. We say a success run of length r happened at trial n if the out- 
comes in the preceding r + 1 trials, including the present trial as the last, 
were respectively F, S, S,..., 5. Let us now label the present state of the 
process by the length of the success run currently under way. In particu- 
lar, if the last trial resulted in a failure, then the state is zero. Similarly, 
when the preceding r + 1 trials in order have the outcomes F, S, S,..., 
S, the state variable would carry the label r. The process is clearly Markov 
(since the individual trials were independent of each other), and its transi- 
tion matrix has the form (5.15), where , 


p, = B, r,=90, and g,=a forn =0,1,2,.... 


A second example is furnished by the current age in a renewal process. 
Consider a light bulb whose lifetime, measured in discrete units, is a ran- 
dom variable €, where 


Prié=k}=a,>0 fork =1,2,...,> a=. 
k=] 
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Let each bulb be replaced by a new one when it burns out. Suppose the 
first bulb lasts until time &,, the second bulb until time €, + &, and the nth 
bulb until time €, + --- + &, where the individual lifetimes ,, &, ... are 
independent random variables each having the same distribution as €. Let 
X,, be the age of the bulb in service at time n. This current age process is 
depicted in Figure 5.1. 

By convention we set X, = 0 at the time of a failure. 

The current age is a success run Markov process for which 


Qy4| 


| r=0,9,=1- py, 
Aye + Qype? + (5.16) 


fork =0,1,.... 


We reason as follows: The age process reverts to zero upon failure of the 
item in service. Given that the age of the item in current service is k, then 
failure occurs in the next time period with conditional probability 
Pe = Aga M(Qys, + Qe. + +**). Given that the item has survived k periods, 
it survives at least to the next period with the remaining probability 
Q = 1 py. 

Renewal processes are extensively discussed in VII. 


Current Age 


0 l 2 3 4 5 6 7 8 9 n 


Figure 5.1 The current age X, in a renewal process. Here é, = 3, &, = 2, 
and é, = 3. 
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Exercises 


5.1. The probability of the thrower winning in the dice game called 
“craps” is p = 0.4929. Suppose Player A is the thrower and begins the 
game with $5, and Player B, his opponent, begins with $10. What is the 
probability that Player A goes bankrupt before Player B? Assume that the 
bet is $1 per round. 


Hint: Use equation (5.13). 


5.2. Determine the gambler’s ruin probability for Player A when both 
players begin with $50, bet $1 on each play, and where the win probabil- 
ity for Player A in each game is 


(a) p = 0.49292929 
(b) p = 0.5029237 


(See II, Section 2.) 
What are the gambler’s ruin probabilities when each player begins with 
$500? 


5.3. Determine P” for n = 2, 3, 4, 5 for the Markov chain whose transi- 
tion probability matrix is 


0.4 0.6 
P=|o7 oat 
5.4. Acoin is tossed repeatedly until three heads in a row appear. Let X,, 
record the current number of successive heads that have appeared. That is, 
X, = Oif the nth toss resulted in tails; X, = 1 if the nth toss was heads and 
the (n — 1)st toss was tails; and so on. Model X,, as a success runs Markov 
chain by specifying the probabilities p, and g;. 


5.5. Suppose that the items produced by a certain process are each 
graded as defective or good, and that whether or not a particular item is 
defective or good depends on the quality of the previous item. To be spe- 
cific, suppose that a defective item is followed by another defective item 
with probability 0.80, whereas a good item is followed by another good 
item with probability 0.95. Suppose that the initial (zeroth) item is good. 
Using equation (5.2), determine the probability that the eighth item is 


Exercises 149 


good, and verify this by computing the eighth matrix power of the transi- 
tion probability matrix. 


5.6. A baseball trading card that you have for sale may be quite valu- 
able. Suppose that the successive bids &, €,,... that you receive are in- 
dependent random variables with the geometric distribution 


Pr{é = k} = 0.01(0.99)* fork =0,1,.... 


If you decide to accept any bid over $100, how many bids, on the aver- 
age, will you receive before an acceptable bid appears? 


Hint: Review the discussion surrounding equation (5.6). 


5.7. Consider the random walk Markov chain whose transition proba- 
bility matrix is given by 


Starting in state 1, determine the probability that the process is absorbed 
into state 0. Do this first using the basic first step approach of equations 
(4.3) and (4.4), and second using the particular results for a random walk 
given in equation (5.13). 


5.8. As a special case, consider a discrete-time queueing model in 
which at most a single customer arrives in any period and at most a single 
customer completes service. Suppose that in any single period, a single 
customer arrives with probability a, and no customers arrive with proba- 
bility 1 — a@. Provided that there are customers in the system, in a single 
period a single customer completes service with probability 6, and no cus- 
tomers leave with probability 1 — 8. Then X,, the number of customers in 
the system at the end of period n, is a random walk in the sense of Section 
5.3. Referring to equation (5.9), specify the transition probabilities p,, q;, 
and r,fori=0O,1,.... 
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9.9. In a simplified model of a certain television game show, suppose 
that the contestant, having won k dollars, will at the next play have k + 1 
dollars with probability gq and be put out of the game and leave with noth- 
ing with probability p = 1 — q. Suppose that the contestant begins with 
one dollar. Model her winnings after n plays as a success runs Markov 
chain by specifying the transition probabilities p,, g,;, and r; in equation 
(5.15). 


Problems 


5.1. As a special case of the successive maxima Markov chain whose 
transition probabilities are given in equation (5.5), consider the Markov 
chain whose transition probability matrix is given by 


0 l 2 3 

0j| a a, a, a; 

p= 1]} 0 ata, a, a; 
21| O 0 Ata, +a, a, 

3 il O 0 0 l 


Starting in state 0, show that the mean time until absorption is vy = 1/a;. 


5.2. Acomponent of a computer has an active life, measured in discrete 
units, that is a random variable 7, where Pr{T = k} = a, fork = 1,2,.... 
Suppose one starts with a fresh component, and each component is re- 
placed by a new component upon failure. Let X, be the age of the compo- 
nent in service at time n. Then {X,} is a success runs Markov chain. 

(a) Specify the probabilities p,; and g;. ,; 

(b) A “planned replacement” policy calls for replacing the component 
upon its failure or upon its reaching age N, whichever occurs first. 
Specify the success runs probabilities p; and g; under the planned 
replacement policy. 


5.3. A Batch Processing Model. Customers arrive at a facility and 
wait there until K customers have accumulated. Upon the arrival of the 
Kth customer, all are instantaneously served, and the process repeats. Let 
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&, €,,... denote the arrivals in successive periods, assumed to be inde- 
pendent random variables whose distribution is given by 


Pr{&=0}=a, Prg&=1}=1-a 


where 0 < a < 1. Let X, denote the number of customers in the system at 
time n. Then {X,} is a Markov chain on the states 0, 1,..., K — 1. With 
K = 3, give the transition probability matrix for {X,,}. Be explicit about 
any assumptions you make. 


5.4. Martha has a fair die with the usual six sides. She throws the die 
and records the number. She throws the die again and adds the second 
number to the first. She repeats this until the cumulative sum of all the 
tosses first exceeds 10. What is the probability that she stops at a cumula- 
tive sum of 13? 


5.5 Let {X,} be a random walk for which zero is an absorbing state and 
such that from a positive state, the process is equally likely to go up or 
down one unit. The transition probability matrix is given by (5.9) with 
ro = 1 and p,; = q; = 3 for i = 1. (a) Show that {X,} is a nonnegative mar- 
tingale. (b) Use the maximal inequality II, (5.7) to limit the probability 
that the process ever gets as high as N > 0. 


6. Functionals of Random Walks and Success Runs 


Consider first the random walk on N + 1 states whose transition proba- 
bility matrix is given by 


a) 
oy OWN 
ooo 2 


SO & 
- VW CO DO W 


- Q CO CO 
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“Gambler’s ruin” is the event that the process reaches state 0 before reach- 
ing state N. This event can be stated more formally if we introduce the 
concept of hitting time. Let T be the (random) time that the process first 
reaches, or hits, state 0 or N. In symbols, 


T = min{n 20; X, = Oorx, = N}. 


The random time T is shown in Figure 6.1 in a typical case. 


N 


l 
0 T n 


Figure 6.1 The hitting time to 0 or N. As depicted here, state 0 was 
reached first. 


In terms of 7, the event written as X; = 0 is the event of gambler’s ruin, 
and the probability of this event starting from the initial state k is 


u, = Pr{X, = 0|X, = k}. 
Figure 6.2 shows the first step analysis that leads to the equations 
U, = Puy, + GQuy-, fork =1,...,N-—1, (6.1) 


0 ] n 


Figure 6.2 First step analysis for the gambler’s ruin problem. 
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with the obvious boundary conditions 
Uy = 1, uy = 0. 


Equations (6.1) yield to straightforward but tedious manipulations. Be- 
cause the approach has considerable generality and arises frequently, it is 
well worth pursuing in this simplest case. 

We begin the solution by introducing the differences x, = u, — u,_, for 
k=1,...,N. Using p + q = 1 to write u, = (p + g)u, = pu, + qu, 
equations (6.1) become 

kK=1, O= p(t, — u) — qu, — Uo) = px, — 9X; 
k = 2; 0 = plu; — U2) — q(u, — u)) = px, — gx; 


k= 3; 0 = plu, — us) — q(us — u) = pry — 9%: 


K=N—-1; 0 = pty — Uy-1) — QUy-1 ~ Un-2) = Py ~ 9Xn-13 
or 
x, = (q/p)x,, 
x3 = (q/p)x, = (qlpyx,, 
X4 = (q/p)x; = (qlp)’x, 


X, = (q/p)x,-, = (q/p)*'x,, 


Xv = (q/p)xy-, = (q/p)""'x,. 


We now recover up, u,,... , Uy by invoking the conditions u, = 1, uy = 0 
and summing the x,’s: 


xX, =u, — uy =u, — 1, 


xX, =U, — ly, x, +x,=u- 1, 
XxX; = Uy — Ub, x, +x, +x, =u,- 1, 
xX, = uy — Uy), X,terertx=u, — 1, 


Xy = Uy — Uy_-; = TUy-}, X, tee tx = Uuy-—1= —-1. 
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The equation for general k gives 
uu=lLtxtxuitees tx, 
= 1+ x, + (g/p)x, + --- + (glp)'x, (6.2) 
1+ [1 + (gip) + +++ + (qip'lx,, 


which expresses u, in terms of the as yet undetermined x,. But u, = 0 
gives 


O=1+ [1 + (q/p) + -:: + (q/p)""'I]x,, 
or 


1 
1+ ip) + + (qipy 


x; 


which substituted into (6.2) gives 


1 + (qip) + ++ + (qip)"' 


uy = 1 1 + (g/p) + +++ + (glp)"" 


The geometric series sums to 


k ifp=q= 
1 + (q/p) + +++ + (qipy"' = ) 1 — ip)’ fp #4. 
1 — (qip) 
whence 
1 — (kK/N) = (N-— KIN when p = q =}, 
u=),_ 1- py _ (apy — PY ow henp#q. — (63) 


1 — (ql/p)” 1 — (q/p)” 
A similar approach works to evaluate the mean duration 
v, = E[T|X, = i]. (6.4) 


The time T is composed of a first step plus the remaining steps. With prob- 
ability p the first step is to state i + 1, and then the remainder, on the 
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average, is v,,, additional steps. With probability q the first step is to 
i — 1 and then, on the average, there are v,_, further steps. Thus for the 
mean duration a first step analysis leads to the equation 


vi = 1 + pv, + Qvi-y fori=1,...,N—1. (6.5) 
Of course, the game ends in states 0 and N, and thus 
Vy = 0, vy = 0. 


We will solve equations (6.5) when p = q = 3. The solution for other 
values of p proceeds in a similar manner, and the solution for a general 
random walk is given later in this section. 

Again we introduce the differences x, = v, — v,., fork = 1,...,N, 
writing (6.5) in the form 


k=1; —1 = 3(v, — V1) — 30, — Vo) = 3X2 — 2X13 

k= 2; —1 = (V3 — vy) — HV, — Vy) = 5X3 — 5X0} 

k = 3; -1l= (V4 — V3) = (V3 — vw) = 3X, — 3X35 
k=N-—1; 00 -1 = 30¥y — Wye) — 3(Vy-1 — Yuna) = 3Xy TO OXNor- 


The right side forms a collapsing sum. Upon adding, we obtain 


=1; -1= hy be 

k = 2; —2 = 3x; — 3X; 

k = 3; —3 = 5X4 —- 3X5 
k=N-1; —(N — 1) = 4xy — 3X). 


The general line gives x, = x, — 2(k — 1) fork = 2,3,..., N. We return 
to the v,’s by means of 


Xy = Vv. 7 Vi; X, +X, = Vy} 


X3 = V3 — Va; X, +X, + X3 = V3; 


Xp = Va A Vena Xterr FR FHV; 
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or 
vy =k, - Bl +24+°-:>+k-Do=kh,-kk-1), (6.6) 


which gives v, in terms of the as yet unknown v,. We impose the bound- 
ary condition v, = 0 to obtain 0 = Nv, — M(N — 1) or v, = (N — 1). Sub- 
stituting this into (6.6) we obtain 


v, = KN — k), k=0,1,...,N, (6.7) 


for the mean duration of the game. Note that the mean duration is great- 
est for initial fortunes k that are midway between the boundaries 0 and N, 
as we would expect. 


6.1. The General Random Walk 


We give the results of similar derivations on the random walk whose tran- 
sition matrix is 


O12 3 --. N 
O}]}1 00 0-.:--- 0 
lig np, 0 0 
P= 2110 G2 Vy Pr 0 ’ 
NiO 0 0 0 l 
where g, > 0 and p, > 0 fork = 1,...,N— 1. Let T = min{n = 0; 


X, = 0 or X, = N} be the hitting time to states 0 and N. 


Problems 


6.1. The probability of gambler’s ruin 
u, = Pr{X, = 0|X, = i} (6.8) 
satisfies the first step analysis equation 
u; = g;U;-, + ru; + Duis, fori=1,...,N—1, 
and 


Uy = l, Uy = 0. 
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The solution is 


pitt: + Pye 


u;j- a... yO 
1+ py + py +--+ + py 


where 


po, 2 ARE k=1,....N—1. 


PiP2°°* Px 
6.2. The mean hitting time 
vy, = E[T|X, = kK] 


satisfies the equations 


Vy, = 1 + gQyvy-) + Ve + Dives, and vy = vy = 0. 


The solution 1s 
( ®, + “ee + Py _, 
V; = | 


(1 +p, +-:: + p_,) 
oe) P; Px 


—~(®, +--+ + @,,) fork=1,...,N—1, 


where p; is given in (6.10) and 


i 


o,=(—+ boty 


q\ 92P\ qiPi-1 
=~ iG BG, yg GH 
Pi’’’ DP; P2°°* Pi Pi-\P;i P; 


fori=1,...,N-—1. 


157 


(6.10) 


(6.11) 


(6.12) 


(6.13) 


(6.14) 


6.3. Fix a state k, where 0 < k < N, and let W, be the mean total visits 


to state k starting from i. Formally, the definition is 


W, = z| >, 1X, = kX = i, 


where 
1 if X, = k, 
; — k} — 


(6.15) 
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Then W,, satisfies the equations 


W, = 9, + giWei. + Wa + DiWain fori=1,...,N-1 


and 
Wor = Wu = 9, 
where 
5 = fF if i = k, 
ik" 10 ifi#k. 
The solution is 
Ge Feat FM (1) ois 
1+eee t+ Pn-1 QkPx-1 
CL Pee FF Pro) 
Wi = | Lites + py oe 
— (pte + p..0| fori = k. 
Gx Px-\ 


Example Asa sample calculation of these functionals, we consider the 
special case in which the transition probabilities are the same from row to 
row. That is, we study the random walk whose transition probability ma- 
trix 1s 


0 23 -:: N 
O11 00 0-.:-- 0 . 
lilg r p 0 - 0 

P=2||0 gq rp: Ol, 
NWO 0 0 0 l 


with p > 0, g > 0, and p + q + r = 1. Let us abbreviate by setting 
6 = (q/p), and then p,, as defined in (6.10), simplifies according to 


k 


- th 4h (4 


k 
)=6 fork=1,...,N-—1. 
P\P2°°* Px Pp 
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The probability of gambler’s ruin, as defined in (6.8) and evaluated in 


(6.9), becomes 
u, = Pr{X,; = O1X, = k} 
Qk + eee + gr! 


1+ @+--- + Q@*! 
k N 
ace if 6 = (gip) #1, 


Nak if 6 = (g/p) = 1. 


This, of course, agrees with the answer given in (6.3). 
We turn to evaluating the mean time 


vy, = E(T|X,=k) fork=1,...,N=1 
by first substituting p, = 6‘ into (6.14) to obtain 
o,=(—+— +--+ Jo 
q9 qa 
1 _ 
— GF t Gi + ++. + 6) 


] | 
= —(] +- g@+t+ coe + gi!) 
Pp 


when p = q(@ = 1), 


i 
p 
11 /1- 6 
— ( when p # q (0 # 1). 
p 


Now observe that 
l+ptectp.,=1+0t+-:- + 6! 


= p®,, 
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so that (6.13) reduces to 
® 
vy, = of + +++ + @y_,) —(®, +--+ + ®,_,). (6.17) 
N 


In order to continue, we need to simplify the terms of the form 
®, + --: + @®_,. We consider the two cases 6 = (g/p) = 1 and 
0 = (q/p) # 1 separately. 

When p = q, or equivalently, 6 = 1, then ®, = i/p, whence 


Ite +(j-1 (j-1 
®, + eee +o,,=- FY) LN) 
P 2p 
which inserted into (6.17) gives 
v; = E[T|X, = i] 
_ | _ U1) (6.18) 
NL 2p 2p 
i(N — 1) . 
= —— if p = q. 
2p 


When p = 3, then v; = i(N — i) in agreement with (6.7). 
When p # q, so that 6 = q/p # 1, then 


1/1 - 6 
%, = (5) 
whence 
] . 
®,+-°°-+O_, = —— i—])-—-(9@+Ft+--- + Or 
ml = a )— ¢ )) 
1 4 {fl = 67 
ren ») (ah 
and 
v, = E[T|X, = i] 


= (eee aal¥ 9) ga al’ ell 


ao Mao) ~ 


6. Functionals of Random Walks and Success Runs 161 


when 6 = (q/p) # 1. 

Finally, we evaluate W,,, expressed verbally as the mean number of vis- 
its to state k starting from X, = i and defined formally in (6.15). Again we 
consider the two cases 0 = (g/p) = 1 and 6 = (q/p) # 1. 

When 6 = 1, then p, = # = Lland1+---+p_,=ip,+°°: + py = 
N — k, and (6.16) simplifies to 


i(N — k) 
7 forO0<isk<N, 
gN 
Wi = 
4 = @-b|- forO<k<i<N, 
q N qN 
_ HN-*) _ maxto, i k} (6.19) 
qN q 
When 6 = (q/p) # 1, then p, = @ and 
Lee + _i-#@ 
Pi-) 1-6’ 
Q@* — @N 
treet py. 
Pr Py-1 1-86 


and 


GPr-1 = PP, = P&. 
In this case, (6.16) simplifies to 


_ = #6 = O)/ 1 _ 
WwW, = Te) oe) forO<i1i=k<N, 


and 
_ (= 66 = @)( 1 
Min = (1 — @)(1 — 6) 5 


We may write the expression for W,, in a single line by introducing the 
notation (i — k)* = max{0, i — k}. Then 

~ad-@)d- ee") 1- gin” 

“pd - a1 - 6) pl = 8” 


forO0<k<i<N. 


(6.20) 
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Short-term cash management is the review and control of a corporation’s 
cash balances, short-term loan balances, and short term marketable secu- 
rity holdings. The objective is to maintain the smallest cash balances that 
are adequate to meet future disbursements. The corporation cashier tries 
to eliminate idle cash balances (by reducing short-term loans or buying 
treasury bills, for example) but to cover potential cash shortages (by sell- 
ing treasury bills or increasing short-term loans). The analogous problem 
for an individual is to maintain an optimal balance between a checking 
and a savings account. 

In the absence of intervention, the corporation’s cash level fluctuates 
randomly as the result of many relatively small transactions. We model 
this by dividing time into successive, equal length periods, each of short 
duration, and by assuming that from period to period, the cash level 
moves up or down one unit, each with probability one-half. Let X, be the 
cash on hand in period n. We are assuming that {X,,} is the random walk 
in which 


Pr{X,,. = k + 1|X, — k} = >. 


The cashier’s job is to intervene if the cash level ever gets too low or too 
high. We consider cash management strategies that are specified by two pa- 
rameters, s and SY, where 0 < s < &. The policy is as follows: If the cash 
level ever drops to zero, then sell sufficient treasury bills to replenish the 
cash level up to s. If the cash level ever increases up to &, then invest in 
treasury bills in order to reduce the cash level to s. A typical sequence of 
cash levels {X,} when s = 2 and F = S is depicted in Figure 6.3. 


Xp 
S=5 
4 

3 
s=2 
l 

0 

| First | Second Third n 
Cycle Cycle Cycle 


Figure 6.3 Several typical cycles in a cash inventory model. 
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We see that the cash level fluctuates in a series of statistically similar 
cycles, each cycle beginning with s units of cash on hand and ending at 
the next intervention, whether a replenishment or reduction in cash. We 
begin our study by evaluating the mean length of a cycle and the mean 
total unit periods of cash on hand during a cycle. Later we use these quan- 
tities to evaluate the long run performance of the model. 

Let T denote the random time at which the cash on hand first reaches 
the level ¥ or 0. That is, T is the time of the first transaction. Let v, = 
E{T|X, = s] be the mean time to the first transaction, or the mean cycle 
length. From (6.7) we have 


v= 5(P — 5). (6.21) 


Next, fix an arbitrary state k (0 < k < Ff) and let W, be the mean num- 
ber of visits to k up to time T for a process starting at X) = s. From (6.19) 
we have 


W, = Jog -~k-(s- N* (6.22) 
Using this we obtain the mean total unit periods of cash on hand up to 


time T starting from X, = s by weighting W, by k and summing according 
to 


(6.23) 


-5 > Kg - 5. sb} 


=< - _ + 2 _ Ss = a + 2} 


s 
= —[f? — s’]. 
a s"] 
Having obtained these single cycle results, we will use them to evalu- 
ate the long run behavior of the model. Note that each cycle starts from 
the cash level s, and thus the cycles are statistically independent. Let K be 


the fixed cost of each transaction. Let 7; be the duration of the ith cycle 


*Use the sum 272} k(a — k) = fa(a + 1)(a — 1). 
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and let R; be the total opportunity cost of holding cash on hand during that 
time. Over n cycles the average cost per unit time is “ 


nK+R, ++: +R, 


Average cost = 
° T+ +T, 


Next, divide the numerator and denominator by n, let n — ©, and invoke 
the law of large numbers to obtain 


K + E[R,] 


Long run average cost = EIT] 


Let r denote the opportunity cost per unit time of cash on hand. Then 
E[R;] = rW,, while E[7;] = v,. Since these quantities were determined in 
(6.21) and (6.23), we have 


K + (1/3)rs(¥? — s?) 


Af — 5) (6.24) 


Long run average cost = 


In order to use calculus to determine the cost-minimizing values for S and 
s, it simplifies matters if we introduce the new variable x = s/¥. Then 
(6.24) becomes 


K + (1/3)rP?x(1 — x’) 
Long run average cost = ———.. 


f?x(1 — x) , 
whence 
d(average cost) K( — 2x) 1 
$$ — =0 = -— + orf, 
ax S2x7(1 — x)’ 37 
d(average cost) _ o=- 2K r(1 +x) | 
df F*x(1 — x) 3° 
which yield 
l 3/3K 
Xon = 3 and ¥,,, = 3s, = 3 wes 


Implementing the cash management strategy with the values s,,, and ¥,,,, 


results in the optimal balance between transaction costs and the opportu- 
nity cost of holding cash on hand. 
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6.3. The Success Runs Markov Chain 


Consider the success runs Markov chain on N + 1 states whose transition 
matrix is 


0 N 
0 1 0 0 
P\ ry qi 0 
2 ) 0 Vr, > O 
a a, 
N-1]py-1 O O O +++ Gye 
N 0 0 O O -:-- l 


Note that states 0 and N are absorbing; once the process reaches one of 
these two states it remains there. 
Let T be the hitting time to states 0 or N, 


T = min{n = 0; X, = OorXx, = N}. 
Problems 
6.1. The probability of absorption at 0 starting from state k 
u, = Pr{X, = 0X, = k} (6.25) 
satisfies the equation 
U, = Ppt VU T Gy Uy ass 
fork=1,...,N—-—1 and w=1, u=0. 


The solution is 


“ =1-(—4_)...(_@™_) 
Prt % Py-1 + 9n-i 


(6.26) 
fork=1,...,N-—1. 


6.2. The mean hitting time 
vy, = E[T|X, = kK] (6.27) 
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satisfies the equation 
Ve = LY, + Mea 
fork =1,...,N—landy=v, = 0. 


The solution is 


l qT, TM, y- 
4 k k+1 4 . 4 k,N-!} 


vy, = - ; (6.28) 
Pet Qe Presi F Ua Py-1 + Qv- 
where 
_ ( VP; \ A+ ( qj-\ 
Ty —_— —_—_—_ —_—_—_—_—_—— a Ce 
Pat Qe/\Prsr T Moi Pj-1 T Qj-1 
(6.29) 
for k < j. 


6.3. Fix a state j (0 <j < N) and let W, be the mean total visits to state 
j starting from state i [see equation (6.15)]. Then 


1 
for j = i, 
D+ 4; 
| . 1 
W, = (2) _ (—_4--_) fori<j, (6.30) 
'  W\p +4 Bj. + qj-1/ Bi + 4 ] 
0 for i > j. 
Exercises 
6.1. A rat is put into the linear maze as shown: 
0 l 2 3 4 5 
shock hae food 


(a) Assume that the rat is equally likely to move right or left at each 
step. What is the probability that the rat finds the food before get- 
ting shocked? 

(b) As a result of learning, at each step the rat moves to the right with 
probability p > } and to the left with probability g = 1 — p < 3. 
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What is the probability that the rat finds the food before getting 
shocked? 


6.2. Customer accounts receivable at Smith Company are classified 
each month according to 


0: Current 

1: 30 to 60 days past due 
2: 60 to 90 days past due 
3: Over 90 days past due 


Consider a particular customer account and suppose that it evolves month 
to month as a Markov chain {X,,} whose transition probability matrix is 


0 ] 2 3 
0}|} 09 0.1 0 0 
1 |} 0.5 0 0.5 0 
2 || 0.3 0 0 0.7 
311 0.2 0 0 0.8 


P = 


Suppose that a certain customer’s account is now in state 1: 30 to 60 days 
past due. What is the probability that this account will be paid (and thereby 
enter state 0: Current) before it becomes over 90 days past due? That is, let 
T = min{n = 0; X, = 0 or X, = 3}. Determine Pr{X; = 01X, = 1}. 


6.3. Players A and B each have $50 at the beginning of a game in which 
each player bets $1 at each play, and the game continues until one player 
is broke. Suppose there is a constant probability p = 0.492929 ... that 
Player A wins on any given bet. What is the mean duration of the game? 


6.4. Consider the random walk Markov chain whose transition proba- 
bility matrix is given by 


0 ] 2 3 
0 l 0 0 0 
1 |} 0.3 0 0.7 0 
2 0 0.3 0 0.7 
3 0 0 0 ] 
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Starting in state 1, determine the mean time until absorption. Do this first 
using the basic first step approach of equations (4.6), and second using the 
particular formula for v; that follows equation (6.18), which applies for a 
random walk in which p # gq. 


Problems 


6.1. Consider the random walk Markov chain whose transition proba- 
bility matrix is given by 


0 0 


0 ] 2 3 

0 ] 0 0 0 

1 || 0.3 0 0.7 0 
Pp = 

2 0 0.1 0 9 

3 0 0 0 ] 
Starting in state 1, determine the mean time until absorption. Do this first 
using the basic first step approach of equations (4.6), and second using the 


particular results for a random walk given in equation (6.13). 


6.2. Consider the Markov chain {X,,} whose transition matrix 1s 


0 1 2 3 
Olla O Bs O 

l 0 QO 
p= || ° P 
2lla B 0 0 
310 0 0 1 


where a > 0, B > 0, and a + B = 1. Determine the mean time to reach 
state 3 starting from state 0. That is, find E[T|X, = 0], where T = 
min{n = 0; X, = 3}. 


6.3. Computer Challenge. You have two urns: A and B, with a balls in 
A and b in B. You pick an urn at random, each urn being equally likely, 
and move a ball from it to the other urn. You do this repeatedly. The game 
ends when either of the urns becomes empty. The number of balls in A at 
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the nth move is a simple random walk, and the expected duration of the 
game is E[T] = ab [see equation (6.7)]. Now consider three urns, A, B, 
and C, with a, b, and c balls, respectively. You pick an um at random, each 
being equally likely, and move a ball from it to one of the other two urns, 
each being equally likely. The game ends when one of the three urns be- 
comes empty. What is the mean duration of the game? If you can guess 
the general form of this mean time by computing it in a variety of partic- 
ular cases, it is not particularly difficult to verify it by a first step analysis. 
What about four urns? 


7. Another Look at First Step Analysis* 


In this section we provide an alternative approach to evaluating the func- 
tionals treated in Section 4. The nth power of a transition probability ma- 
trix having both transient and absorbing states is directly evaluated. From 
these nth powers it 1s possible to extract the mean number of visits to a 
transient state j prior to absorption, the mean time until absorption, and the 
probability of absorption in any particular absorbing state k. These func- 
tionals all depend on the initial state X, = i, and as a by-product of the de- 
rivation, we show that, as functions of this initial state i, these functionals 
satisfy their appropriate first step analysis equations. 


Consider a Markov chain whose states are labeled 0, 1,..., N. States 
0O,1,...,7— 1 are transient in that Pe’ 0 asn > © for0 =i,j <7, 
while states r, ... , N are absorbing, or trap, and here P, = 1 forr SiN. 


The transition matrix has the form 
_{Q R | 


where 0 is an (N — r + 1) X r matrix all of whose components are zero, 
Tis an(N — r+ 1) X (N-— r + 1) identity matrix, and Q, = P,, for 
0O=ij<r. 

To illustrate the calculations, begin with the four-state transition matrix 


* This section contains material at a more difficult level. It is not prerequisite to what 
follows. 
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0 jt 2 3 


01} Qo Qo Ry Ro 
] R, R 
P _ 0 0, 12 13 (7.2) 
21; 0 0 ] ) 
311 O 0 0 ] 
Straightforward matrix multiplication shows the square of P to be 
Q? R+QR | 
2 — 
Pp’ = | 0 I ; (7.3) 
Continuing on to the third power, we have 
ied ol el 
0 =I 0 I 10 I 
and for higher values of n, 
n— I+ +e + Q"°°)R 
P | +4 (I+ Q ; Q" | (7.4) 


The consideration of four states was for typographical convenience only. 
It is straightforward to verify that the nth power of P is given by (7.4) 
for the general (NV + 1)-state transition matrix of (7.1) in which states 0, 
1,...,7— 1 are transient (P{?— 0 as n > © for 0 Si, j < r) while states 
r,...,N are absorbing (P; = | forr SiS WN). 

We turn to the interpretation of (7.4). Let W%? be the mean number of 
visits to state j up to stage n for a Markov chain starting in state i. 
Formally, | 


Wo = | > 1{X, = j}X, = i), (7.5) 
1=0 
where 
l if X; — Js 
1% =} = f if X, + j (7.6) 


Now, E[1{X, = T}X = i] = Pr{x, = Xo = i} = P", and since the ex- 


i? 
pected value of a sum is the sum of the expected values, we obtain from 
(7.5) that 
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w= S E(L(X, = j}|X. = i] 
i=0 
(7.7) 


Equation (7.7) holds for all states i, 7, but it has the most meaning when i and 
j are transient. Because (7.4) asserts that P!? = Q{) when 0 S i, j < r, then 


Wo = Q" + Q') teee + Q", 0 <= i,j < r, 
where 


(0) = F if y = J; 
7 10 ifi #j. 
In matrix notation, Q® = I, and because Q” = Q’, the nth power of Q, 
then 
wm=14+Q04+Q'4+:°°-+Q 
=1+Q07+Q+-°°-+Q”") (7.8) 
= I + Qw», 
Upon writing out the matrix equation (7.8) in terms of the matrix entries, 


we recognize the results of a first step analysis. We have 


r—1 
— (n—-1 
k=0 


r—-1 
—_— (n~1) 
k=0 


In words, the equation asserts that the mean number of visits to state 7 in 
the first n stages starting from the initial stage i includes the initial visit if 
i = j (6,) plus the future visits during the n — 1 remaining stages weighted 
by the appropriate transition probabilities. 

We pass to the limit in (7.8) and obtain for 


W, = lim W')= E[Total visits to j|Xo = i], 0<ij<r, 


the matrix equations 


W=1+Q+Q +: 
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W =1+ QW. (7.9) 


In terms of its entries, (7.9) is 
r-] 
W,=6,+)> PW, fori,j=0,...,r-1. (7.10) 
i=0 


Equation (7.10) is the same as equation (4.11), which was derived by a 
first step analysis. 
Rewriting equation (7.9) in the form 


W — QW = (I —- QW = I, (7.11) 
we see that W = (I — Q)'!, the inverse matrix to I — Q. The matrix W is 
often called the fundamental matrix associated with Q. 


Let T be the time of absorption. Formally, since statesr,r+1,...,N 
are the absorbing ones, the definition is 


T= min{n=0;rsX, =N}. 
Then the (i, j)th element W, of the fundamental matrix W evaluates 
T-1 
W; = | y 1{X = j}|X, = i forOSij<r. (7.12) 
n=0 


Let v, = E[T|X, = i] be the mean time to absorption starting from state i. 
The time to absorption is composed of sojourns in the transient states. 
Formally, 


r—-1T-1 T-1 r-1 
1{X, = j} = 1{X, = j} 
j=0 n=0 n=0 j=0 
T-1 
= 1 = T. 
n=0 


It follows from (7.12), then, that 


r—1 r-1 T-1 
W,= >. | Y 1{X, = j}|X = i 
j=0 j=0 ‘n=0 (7.13) 


=E[T|X,=i]=v, forOSi<r. 
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Summing equation (7.10) over transient states j as follows, 


r-Ir- 


r—-1 r—1 I 
S We=> 6 +>, > PW, fori=0,1,...,.7-1, 
j=0 j=0 j=0 k=0 
and using the equivalence v, = %‘=) W, leads to 
r—-l 
vyi=1+> Py, fori=0,1,...,7r—1. (7.14) 
k=0 


This equation is identical with that derived by first step analysis in (4.10). 
We turn to the hitting probabilities. Recall that states k = r,..., N are ab- 
sorbing. Since such a state cannot be left once entered, the probability of 
absorption in a particular absorbing state k up to time n, starting from ini- 
tial state i, is simply 


P® = Pr{X, = k|X, = i} 
= Pr{T <n and X; = kX, = i} (7.15) 
fori=0,...,r-1lj;k=r,...,N, 
where T = min{n = 0: r S X, = N} is the time of absorption. Let 


U®= Pr{T =n and X;, = k|X, = i} 


(7.16) 
forO =i<randrskE_wN. 
Referring to (7.4) and (7.15), we give the matrix U™ by 
U™=1+Q+°:°-+Q" YR 
(7.17) 


=W"- YR [by (7.8)]. 


If we pass to the limit in n, we obtain the hitting probabilities 
U, = lim U® = Pr{X, = kX, = i} forO =i<randrskE_Nu. 


Equation (7.17) then leads to an expression of the hitting probability ma- 
trix U in terms of the fundamental matrix W as simply U = WR, or 


r—-l 
U,=)> WR,  forO<i<randr<k<N. (7.18) 
j=0 
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Equation (7.18) may be used in conjunction with (7.10) to verify the first 
step analysis equation for U;,. We multiply (7.10) by R,, and sum, obtain- 
ing thereby 


r-lr-1l 


yw ij Ry = -Sa ij Ri + 2, 2, PWR 


which with (7.18) gives 


U;,, = Ry + > P,U;, 


= P, + PU, forO Si<randr=kE=_N. 
This equation was derived earlier by first step analysis in (4.8). 


Exercises 


7.1. Consider the Markov chain whose transition probability matrix is 
given by 
0 l 2 3 
0 l 0 0 0 
1 01 02 O5 0.2 
2101 02 06 0.1 
3 0 0 0 l 


The transition probability matrix corresponding to the nonabsorbing states 
1S 


_ 9 | 0. ; 0. 5 , 
110.2 0.6 
Calculate the matrix inverse to I — Q, and from this determine 


(a) the probability of absorption into state O starting from state 1; 
(b) the mean time spent in each of states 1 and 2 prior to absorption. 
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7.2. Consider the random walk Markov chain whose transition proba- 
bility matrix is given by 


The transition probability matrix corresponding to the nonabsorbing states 
is 


Q='|,5 ; 07] 
0.3 


Calculate the matrix inverse to I — Q, and from this determine 
(a) the probability of absorption into state O starting from state 1; 
(b) the mean time spent in each of states 1 and 2 prior to absorption. 


Problems 


7.1. A zero-seeking device operates as follows: If it is in state m at time 
n, then at time n + 1 its position is uniformly distributed over the states 
0,1,...,m— 1. State 0 is absorbing. Find the inverse of the I — Q ma- 
trix for the transient states 1, 2,..., m. 


7.2. A zero-seeking device operates as follows: If it is in state j at time 
n, then at time n + 1 its position is 0 with probability 1/j, and its position 
is k (where k is one of the states 1, 2,..., 7 — 1) with probability 2k/j’. 
State 0 is absorbing. Find the inverse of the I — Q matrix. 


7.3. Let X, be an absorbing Markov chain whose transition probability 
matrix takes the form given in equation (7.1). Let W be the fundamental 
matrix, the matrix inverse of I — Q. Let 


T=min{n2=0;rsnSN} 


176 lil Markov Chains: Introduction 


be the random time of absorption (recall that states r, r +1, ...,Nare 
the absorbing states). Establish the joint distribution 


Pr{X,_, = j, X; = kX = i} = W,P for0O<ij<nrsk<N, 


ij’ jk 


whence 


N 
Pr{X,.,=jX,=i} =>. WP, for0sij<r. 


ij 


~ 
II 
“ 


7.4. The possible states for a Markov chain are the integers 0, 1, ..., 
N, and if the chain is in state j, at the next step it is equally likely to be in 


any of the states 0,1,...,j7 — 1. Formally, 
l, ifi=j = 0, 
P,= 40 fO<isjEN, 


Vi, ifO<Sj<i<N. 


(a) Determine the fundamental matrix for the transient states 
1,2,...,N. 
(b) Determine the probability distribution for the last positive integer 
that the chain visits. 
7.5. Computer Challenge. Consider the partial sums: 
So=k and Sr =k+& ++: + &, k > 0, 


where é,, €,, . .. are independent and identically distributed as 


Prig=0} =1-—, 
T 


and 
Pr{é = +7} -__* __ ; = 1,2,.... 
mAj* — 1) 
Can you find an explicit formula for the mean time v, for the partial sums 
starting from S, = k to exit the interval [(0, VN] = {0, 1,..., NM}? In an- 


other context, the answer was found by computing it in a variety of spe- 
cial cases. 
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(Note: A simple random walk in the integer plane moves according to 
the rule, If (X,, Y,) = (i, /), then the next position is equally likely to be 
any of the four points (i + 1,7), @— 1,7), GJ + 1), or GJ — 1). Let us 
suppose that the process starts at the point (Xo, %)) = (, k) on the diago- 
nal, and we observe the process only when it visits the diagonal. Formally, 
we define 


T, = min{n > 0; X, = Y,}, 
and 


T, = min{n > 7,,_,; X, 


n 


= Y,}. 
It is not hard to show that 
So =k, S, =X, = Y,,m> 0, 


is a version of the above partial sum process. 


8. Branching Processes* 


Suppose an organism at the end of its lifetime produces a random number 
€ of offspring with probability distribution 


Pr{é=k} =p, fork =0,1,2,..., (8.1) 


where as usual, p, = 0 and >/;-» p; = 1. We assume that all offspring act 
independently of each other and at the ends of their lifetimes (for sim- 
plicity, the lifespans of all organisms are assumed to be the same) indi- 
vidually have progeny in accordance with the probability distribution 
(8.1), thus propagating their species. The process {X,,}, where X,, is the 
population size at the nth generation, is a Markov chain of special struc- 
ture called a branching process. 

The Markov property may be reasoned simply as follows. In the 
nth generation the X, individuals independently give rise to numbers of 


* Branching processes are Markov chains of a special type. Sections 8 and 9 are not pre- 
requisites to the later chapters. 
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offspring é”, &,..., €?, and hence the cumulative number produced for 
the (n + 1)st generation is 


Xn+ — gy” + E+ ree cb gé&. (8.2) 


8.1. Examples of Branching Processes 


There are numerous example of Markov branching processes that arise 
naturally in various scientific disciplines. We list some of the more promi- 
nent cases. 


Electron Multipliers 


An electron multiplier is a device that amplifies a weak current of elec- 
trons. A series of plates are set up in the path of electrons emitted by a 
source. Each electron, as it strikes the first plate, generates a random num- 
ber of new electrons, which in turn strike the next plate and produce more 
electrons, and so forth. Let X, be the number of electrons initially emitted, 
X, the number of electrons produced on the first plate by the impact due 
to the X, initial electrons; in general, let X, be the number of electrons 
emitted from the nth plate due to electrons emanating from the (n — 1)st 
plate. The sequence of random variables X,, X,, X,,...,X,, ... constitutes 
a branching process. 


Neutron Chain Reaction 


A nucleus is split by a chance collision with a neutron. The resulting fis- 
sion yields a random number of new neutrons. Each of these secondary 
neutrons may hit some other nucleus, producing a random number of ad- 
ditional neutrons, and so forth. In this case the initial number of neutrons 
is X) = 1. The first generation of neutrons comprises all those produced 
from the fission caused by the initial neutron. The size of the first genera- 
tion is a random variable X,. In general, the population X,, at the nth gen- 
eration is produced by the chance hits of the X,_, individual neutrons of 
the (n — 1)st generation. 
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Survival of Family Names 


The family name is inherited by sons only. Suppose that each individual 
has probability p, of having k male offspring. Then from one individual 
there result the Ist, 2nd, ..., nth, ... generations of descendants. We may 
investigate the distribution of such random variables as the number of de- 
scendants in the nth generation, or the probability that the family name 
will eventually become extinct. Such questions will be dealt with begin- 
ning in Section 8.3. 


Survival of Mutant Genes 


Each individual gene has a chance to give birth to k offspring, k = 
],2,..., which are genes of the same kind. Any individual, however, has 
a chance to transform into a different type of mutant gene. This gene may 
become the first in a sequence of generations of a particular mutant gene. 
We may inquire about the chances of survival of the mutant gene within 
the population of the original genes. In this example, the number of off- 
spring is often assumed to follow a Poisson distribution. 

The rationale behind this choice of distribution is as follows. In many 
populations a large number of zygotes (fertilized eggs) are produced, only 
a small number of which grow to maturity. The events of fertilization and 
maturation of different zygotes obey the law of independent binomial tri- 
als. The number of trials (i.e., number zygotes) is large. The law of rare 
events then implies that the number of progeny that mature will approxi- 
mately follow the Poisson distribution. The Poisson assumption seems 
quite appropriate in the model of population growth of a rare mutant gene. 
If the mutant gene carries a biological advantage (or disadvantage), then 
the probability distribution is taken to be the Poisson distribution with 
mean A > | or (<1). 

All of the preceding examples possess the following structure. Let X, 
denote the size of the initial population. Each individual gives birth to k 
new individuals with probability p, independently of the others. The total- 
ity of all the direct descendants of the initial population constitutes the 
first generation, whose size we denote by X,. Each individual of the first 
generation independently bears a progeny set whose size is governed by 
the probability distribution (8.1). The descendants produced constitute the 
second generation, of size X,. In general, the nth generation is composed 
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of descendants of the (n — 1)st generation, each of whose members inde- 
pendently produces k progeny with probability p,, k = 0, 1, 2,....The 
population size of the nth generation is denoted by X,,. The X,, forms a se- 
quence of integer-valued random variables that generate a Markov chain 
in the manner described by (8.2). 


8.2. The Mean and Variance of a Branching Process 


Equation (8.2) characterizes the evolution of the branching process as suc- 
cessive random sums of random variables. Random sums were studied in 
II, Section 3, and we can use the moment formulas developed there to 
compute the mean and variance of the population size X,. First some no- 
tation. Let u = E[é] and ao? = Var[é] be the mean and variance, respec- 
tively, of the offspring distribution (8.1). Let M(n) and V(n) be the mean 
and variance of X, under the initial condition X, = 1. Then direct applica- 
tion of II, (3.9) with respect to the random sum (8.2) gives the recursions 


M(n + 1) = pM(n) (8.3) 
and 
Vin + 1) = 0’ M(n) + p’Vi(n). (8.4) 


The initial condition X, = 1 starts the recursions (8.3) and (8.4) at M(O) = 1 
and V(Q) = 0. Then, from (8.3) we obtain M(1) = wl = p, M(2) = 
uM(1) = pw’, and, in general, 


M(n) = p" forn=0O,1,.... (8.5) 


Thus the mean population size increases geometrically when yz > 1, de- 
creases geometrically when yu < 1, and remains constant when p = 1. 

Next, substitution of M(n) = jp” into (8.4) gives Vin + 1) = o7p" + 
Vn), which with V(0) = 0 yields 


VL) = 0°, 
V(2) = om + V1) = ofp + ow, 
V(3) = ofp? + p’V(2) 


— ow + ow + op", 
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and, in general, 
V(n) = o7 [pr +- pe" sf see + pr?) 


— op" [1 +- vi + eee + pe" '7 
n if uw = 1, (8.6) 


=o Xp Meg, 
l-p 
Thus the variance of the population size increases geometrically if 
yz > 1, increases linearly if w = 1, and decreases geometrically if uw < 1. 


8.3. Extinction Probabilities 


Population extinction occurs when and if the population size is reduced to 
zero. The random time of extinction N is thus the first time n for which 
X, = O, and then, obviously, X, = O for all k = N. In Markov chain ter- 
minology 0 is an absorbing state, and we may calculate the probability of 
extinction by invoking a first step analysis. Let 


u, = Pr{N =n} = Pr{X, = 0} (8.7) 


be the probability of extinction at or prior to the nth generation, beginning 
with a single parent X, = 1. Suppose that the single parent represented by 
X, = | gives rise to & = k offspring. In turn, each of these offspring will 
generate a population of its own descendants, and if the original popula- 
tion is to die out in n generations, then each of these k lines of descent 
must die out in n — 1 generations. The analysis is depicted in Figure 8.1. 

Now, the k subpopulations generated by the distinct offspring of the 
Original parent are independent, and they have the same statistical proper- 
ties as the original population. Therefore, the probability that any particu- 
lar one of them dies out inn — | generations is u,_, by definition, and the 
probability that all k subpopulations die out inn — | generations is the kth 
power (u,_,)* because they are independent. Upon weighting this factor by 
the probability of k offspring and summing according to the law of total 
probability, we obtain 


u, — >» PCy), n= I, 2, sees (8.8) 
k=0 
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Generation 
X)= | 
0 


_ — a = EO) = 
1 a” 
IRANI 
2 
\ 
| \ 
PHAR K 
| 
| 
nee =——— ~~ / 


k subsequent subpopulations 


Figure 8.1 The diagram illustrates that if the original population is to die out 
by generation n, then the subpopulations generated by distinct 
initial offspring must all die out inn — 1 generations. 


Of course uy) = 0, and u, = po, the probability that the original parent had 
no offspring. 


Example Suppose a parent has no offspring with probability } and two 
offspring with probability 3. Then the recursion (8.8) specializes to 


] ,. 1+ 3(u,- yr 
nn ee a 
U,, 4 (u U,,- y= 4 


Beginning with u) = 0 we successively compute 
u, = 0.2500, u, = 0.3313, 
u, = 0.2969, u, = 0.3323, 
u, = 0.3161, uz, = 0.3328, 
u, = 0.3249, uy = 0.3331, 
u, = 0.3292, Ug = 0.3332. 
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We see that the chances are very nearly ; that such a population will die 
out by the 10th generation. 


Exercises 


8.1. A population begins with a single individual. In each generation, 
each individual in the population dies with probability 3 or doubles with 
probability 3. Let X,, denote the number of individuals in the population in 
the nth generation. Find the mean and variance of X,,. 


8.2. The number of offspring of an individual in a popualtion is 0, 1, or 
2 with respective probabilities a > 0, b > 0, and c > 0, wherea + b + 
c = 1. Express the mean and variance of the offspring distribution in 
terms of b and c. 


8.3. Suppose a parent has no offspring with probability ; and has two 
offspring with probability 5. If a population of such individuals begins 
with a single parent and evolves as a branching process, determine u,, the 
probability that the population is extinct by the nth generation, forn = 1, 
2, 3, 4, 5. 


8.4. At each stage of an electron multiplier, each electron, upon striking 
the plate, generates a Poisson distributed number of electrons for the next 
stage. Suppose the mean of the Poisson distribution is A. Determine the 
mean and variance for the number of electrons in the nth stage. 


Problems 


8.1. Each adult individual in a population produces a fixed number M of 
offspring and then dies. A fixed number L of these remain at the location 
of the parent. These local offspring will either all grow to adulthood, 
which occurs with a fixed probability B, or all will die, which has proba- 
bility 1 — B. Local mortality is catastrophic in that it affects the entire 
local population. The remaining N = M — L offspring disperse. Their suc- 
cessful growth to adulthood will occur statistically independently of one 
another, but at a lower probability a = pB, where p may be thought of as 
the probability of successfully surviving the dispersal process. Define 
the random variable € to be the number of offspring of a single parent that 
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survive to reach adulthood in the next generation. According to our as- 
sumptions, we may write & as 


f= vp tv, +++ + vy + (M — N)O, 


where ©, v,, v.,..., Vy are independent with Pr{v, = 1} = a, Pr{v, = 0} = 
1 — a, and with Pr{® = 1} = Band Pr{ 6 = 0} = 1 — B. Show that the 
mean number of offspring reaching adulthood is E[é] = aN + B(M — N), 
and since a < B, the mean number of offspring is maximized by dispers- 
ing none (N = Q). Show that the probability of having no offspring sur- 
viving to adulthood is 


Pr{g = 0} = (1 — @)"(1 = 8B), 
and that this probability is made smallest by making N large. 


8.2. Let Z = >*_, X, be the total family size in a branching process 
whose offspring distribution has a mean w = E[é] < 1. Assuming that 
X, = 1, show that E[Z] = 1/(1 — p). 


8.3. Families in a certain society choose the number of children that 
they will have according to the following rule: If the first child is a girl, 
they have exactly one more child. If the first child is a boy, they continue 
to have children until the first girl, and then cease childbearing. 


(a) Fork = 0, 1, 2,..., what is the probability that a particular fam- 
ily will have & children in total? 
(b) For k = 0, 1, 2,..., what is the probability that a particular fam- 


ily will have exactly k male children among their offspring? 
8.4. Let {X,} be a branching process with mean family size yu. Show 
that Z, = X,,/p" 1s a nonnegative martingale. Interpret thé maximal in- 
equality as applied to {Z,}. 


9. Branching Processes and Generating Functions* 


Consider a nonnegative integer-valued random variable € whose proba- 
bility distribution is given by 


* This topic 1s not prerequisite to what follows. 
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Pr{é = k} = p, fork =0,1,.... (9.1) 


The generating function ¢(s) associated with the random variable € (or 
equivalently, with the distribution {p,}) is defined by 


d(s) = E[s*] = » p,s* for OSs<= 1. (9.2) 
k=0 


Much of the importance of generating functions derives from the fol- 
lowing three results. 

First, the relation between probability mass functions (9.1) and gener- 
ating functions (9.2) is one-to-one. Thus knowing the generating function 
is equivalent, in some sense, to knowing the distribution. The relation that 
expresses the probability mass function {p,} in terms of the generating 
function f(s) is 


: _ 1 d*{s) 
| Px ~~ k! ds* <0 (9.3) 
For example, - 
Ps) = Po + ps + prs’ + **°, 
whence 
Po = (0), 
and 
dfs) ) 
qs + 2p,s + 3p,s° + +°:, 
whence 
_ 44s) 
Pi ds s=0 
Second, if g,, ..., €, are independent random variables having gener- 
ating functions ¢,(s),..., ,(s), respectively, then the generating function 
of their sum X = é, + -:- + &, is simply the product 
Ps) = $,(s)P,(s) +** (5). (9.4) 


This simple result makes generating functions extremely helpful in deal- 
ing with problems involving sums of independent random variables. It is 
to be expected, then, that generating functions might provide a major tool 
in the analysis of branching processes. 
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Third, the moments of a nonnegative integer-valued random variable 
may be found by differentiating the generating function. For example, the 
first derivative is 


d 
As) = p, + 2p.s + 3p,s? + +, 
ds 
whence 
dd(s 
“ = p, + 2p, + 3p, + -:: = Efé], (9.5) 
s=] 
and the second derivative is 
a a = 2p, + 3(2)ps + 4(3)p,s? + - 
whence 
d* ds 
a ys 2p, + 3(2)p, + 4(3)p, + 
S s=] 
= S k(k — 1)p, = El&(€ — 1)] 
k=2 
= E[é’] — Efé]. 
Thus 
S) 
E[é*] = — + E[€] 
s=] 
_ a f(s) , Hs) 
ds? |. ds |,= 
and 


Var[€] = Elé*] — {Elé]}° 


_ FHs)|  , ahs) 


2 (9.6) 
ds? v= ds | 


_ je 
l ds 


s= 


Example If é has a Poisson distribution with mean A for which 
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Nke-A 
pe=Prg=k}=—T—  fork=0,1,..., 
then 
a Mer 
f(s) = E{s*] = > gt§ OE 
k=0 k! 
= (As)* 
— p-A 
ar: 
=ees= eM) for |s| <1. 
Then 
dq(s) dds) 
—__ — 7, ~AC=5)¢ _ r: 
ds e ds |,=; 
d*¢(s) d? ds) 
—_—__* = dM “AC 5). — d2. 
ds’ ° ds? s=] 
From (9.5) and (9.6) we verify that 
E[é] = A, 


Var[é] = W+A—-(CAPR=A. 


9.1. Generating Functions and Extinction Probabilities 


Consider a branching process whose population size at stage n is denoted 
by X,,. Assume that the offspring distribution p, = Pr{é = k} has the gen- 
erating function ¢(s) = E[s‘] = &, s*p,. If u, = Pr{X, = 0} is the proba- 
bility of extinction by stage n, then the recursion (8.8) in terms of gener- 
ating functions becomes 


U,, = > Pi(Uy-1)* = P(u,,-;). 
k=0 


That is, knowing the generating function ¢(s), we may successively com- 
pute the extinction probabilities u,, beginning with u, = O and then u, = 
(Up), U. = {u,), and so on. 
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Example The extinction probabilities when there are no offspring with 
probability p, = 4 and two offspring with probability p, = 2 were com- 
puted in the example in Section 8.3. We now reexamine this example 
using the offspring generating function ¢(s) = 4 + 3s”. This generating 
function is plotted as Figure 9.1. From the picture it is clear that the ex- 
tinction probabilities converge upward to the smallest solution of the 
equation u = d(x). This, in fact, occurs in the most general case. If u,, de- 
notes this smallest solution to u = (u), then u,, gives the probability that 
the population eventually becomes extinct at some indefinite, but finite, 
time. The alternative is that the population grows infinitely large, and this 
occurs with probability 1 — x... 

For the example at hand, $(s) = } + js’, and the equation u = ¢(u) is 
the simple quadratic u = 5 + ju’, which gives 


4+V16- 12 
Kk ODO 
6 


= |, 


Ww | 


—_— 


#(s) 


Figure 9.1. The generating function corresponding to the offspring distribu- 
tion p, = § and p, = }. Here u, = Pr{X, = 0} is the probability of 
extinction by generation k. 
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The smaller solution is u., = 3, which is to be compared with the apparent 
limit of the sequence u,, computed in the example in Section 8.3. 

It may happen that u,, = 1, that is, the population is sure to die out at 
some time. An example is depicted in Figure 9.2: The offspring distribu- 
tion is py = 2 and p, = }. We solve u = d(u) = 5 + 4’ to obtain 


4+VvV16- 12 
LF OO 


= 1,3. 
2 


The smaller solution is u,, = 1, the probability of eventual extinction. 

In general, the key is whether or not the generating function (s) 
crosses the 45° line #(s) = s, and this, in turn, can be determined from the 
slope 


dd(s) 
ds s=l 
of the generating function at s = 1. If this slope is less than or equal to 


one, then no crossing takes place, and the probability of eventual extinc- 
tion is u,, = 1. On the other hand, if the slope @’(1) exceeds one, then the 


¢'(1) = 


Figure 9.2 The generating function corresponding to the offspring distribu- 
tion py) = 3 and p, = §. 
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equation u = $(u) has a smaller solution that is less than one, and extinc- 
tion is not a certain event. 

But the slope ¢’(1) of a generating function at s = 1 is the mean E[€] 
of the corresponding distribution. We have thus arrived at the following 
important conclusion: If the mean offspring size E[€] = 1, then u, = 1 
and extinction is certain. If E[€] > 1, then u,. < 1 and the population may 
grow unboundedly with positive probability. 

The borderline case E[€é] = 1 merits some special attention. Here 
E[X,|X) = 1] = 1 for all n, so the mean population size is constant. Yet the 
population is sure to die out eventually! This is a simple example in which 
the mean population size alone does not adequately desqribe the popula- 
tion behavior. | 


9.2. Probability Generating Functions and 
Sums of Independent Random Variables 


Let € and 7 be independent nonnegative integer-valued random variables 
having the probability generating functions (p.g.f.s) 


d(s) = E[s’] and ys) =E[s"] for |s| < 1. 


The probability generating function of the sum € + 177 1s simply the prod- 
uct d(s)U(s) because 


E{s#*"] = E[sés™] 
= E[s*]JE[s"] (because € and 7 are independent) (9.7) 
= G(s) p(s). 


The converse is also true. Specifically, if the product of the p.g.f.s of 
two independent random variables is a p.g.f. of a third random variable, 
then the third random variable equals (in distribution) the sum of the other 
two. 

Let €,, &,... be independent and identically distributed nonnegative 
integer-valued random variables with p.g.f. d(s) = E[s*]. Direct induction 
of (9.7) implies that the sum &, + --- + &, has p.g.f. 


E[s'*"**n] = [(s)]”. (9.8) 
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We extend this result to determine the p.g.f. of a sum of a random n number 


_of independent summands. Accordingly, let N be a nonnegative int integer- 
valued random variable, independent of &, &, ..., with p.g.f. gy(s) = 


E[s”], and consider the random sum (see II, Section 3). 
X = & + i + &y. 


Let h,(s) = E[s*] be the p.g.f. of X. We claim that h,(s) takes the simple 
form | 


hs) = gyl G(s). (9.9) 
To establish (9.9), consider 


ve | 


h,(s) = > Pr{X = k}s* 


> 
I 
© 


Pr{X = kN = n} Pr{N = n})s! 


=> (> 

k=0 ‘n=0 

=> (> P Pr{gé, to+- + E,= kN =n} Pr{N = no! 
k=0 ‘n=O 

= 2, 2P r{g, + tk E = KP PHN = nhst 


> 
II 

cS 

= 


[because N is independent of €,, &,,. . .] 


=> (> Pr{é, + «++ + & = k)s!) PrN =n} 


I 
Ma 


d(s)" Pr{N = n} [using (9.8)] 


0 


248) [by the definition of g,(s)]. 


With the aid of (9.9), the basic branching process equation 
X41 = gi + oe of gy (9.10) 


can be expressed equivalently and succinctly by means of generating 
functions. To this end, let ¢,(s) = E[s*-] be the p.g.f. of the population size 
X, at generation n, assuming that X, = 1. Then easily, ¢,(s) = E[s'] = s, 
and ¢,(s) = ¢(s) = E[s*]. To obtain the general expression, we apply (9.9) 
to (9.10) to yield 
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Pnsi(S) — G1 O(s)]. (9.11) 


This expression may be iterated in the manner 


Gris ilS) = b,-1{ AL P(5)]} 
= o{--- df h(s)]} (9.12) 


(n + 1) iterations 


= Al¢,(s)]. 


That is, we obtain the generating function for the population size X,, at 
generation n, given that X, = 1, by repeated substitution in the probabil- 
ity generating function of the offspring distribution. — 

For general initial population sizes X, = k, the p.g.f. 1s 


>, PriX, = j|Xo = kis’ = [6,091 (9.13) 
j=0 
exactly that of a sum of k independent lines of descents. From this per- 
spective, the branching process evolves as the sum of k independent 
branching processes, one for each initial parent. 


Example Let ¢(s) = g + ps, where 0 < p< 1 andp + q = 1. Theas- 
sociated branching process is a pure death process. In each period each 
individual dies with probability g and survives with probability p. The 
iterates ,(s) in this case are readily determined, e.g., 6,(s) = q + p, 
(q + ps) = 1 — p’? + p’s, and generally, ¢,(s) = 1 — p" + p’s. If we fol- 
low (9.13), the nth generation p.g.f. starting from an initial population size 
of k is [¢,(s)}* = [1 — p" + p’s}*. 

The probability distribution of the time JT to extinction may be deter- 
mined from the p.g.f. as follows: 


Pr{T = n|X(0) = k} = Pr{X, = OX, = k} — Pr{X,-, = OLX, = k} 
= [,(0) — [¢,-:(0)}* 
=(1- p'*}- (1 - py. 


9.3. Multiple Branching Processes 


Population growth processes often involve several life history phases 
(e.g., juvenile, reproductive adult, senescence) with different viability and 
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behavioral patterns. We consider a number of examples of branching 
processes that take account of this characteristic. 

For the first example, suppose that a mature individual produces off- 
spring according to the p.g.f. o(s). Consider a population of immature in- 
dividuals each of which grows to maturity with probability p and then re- 
produces independently of the status of the remaining members of the 
population. With probability 1 — p an immature individual will not attain 
maturity and thus will leave no descendants. With probability p an indi- 
vidual will reach maturity and reproduce a number of offspring deter- 
mined according to the p.g.f. d(s). Therefore the progeny size distribution 
(or equivalently the p.g.f.) of a typical immature individual taking account 
of both contingencies is 


(1 — p) + p(s). (9.14) 


If a census is taken of individuals at the adult (mature) stage, the ag- 
gregate number of mature individuals contributed by a mature individual 
will now have p.g.f. 


g(1 — p + ps). (9.15) 


(The student should verify this finding.) 
It is worth emphasis that the p.g.f.s (9.14) and (9.15) have the same 
mean p¢'(1) but generally not the same variance, the first being 


PI¢"() + ¢'() — ($0) 


as compared to 


pd'(1) + pd’) — p(d'(1)y. 


Example A second example leading to (9.15), as opposed to (9.14), 
concerns the different forms of mortality that affect a population. We ap- 
praise the strength (stability) of a population as the probability of indefi- 
nite survivorship = 1 — probability of eventual extinction. 

In the absence of mortality the offspring number X of a single individ- 
ual has the p.g.f. d(s). Assume, consistent with the postulates of a branch- 
ing process, that all offspring in the population behave independently gov- 
emed by the same probability laws. Assume also an adult population of 
size X = k. We consider three types of mortality: 


(a) Mortality of Individuals Let p be the probability of an offspring sur- 
viving to reproduce, independently of what happens to others. Thus the 
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contribution of each litter (family) to the adult population of the next gen- 
eration has a binomial distribution with parameters (N, p), where N is the 
progeny size of the parent with p.g.f. d(s). The p.g.f. of the adult numbers 
contributed by a single grandparent is therefore d(q + ps),q = 1 — p, and 
for the population as a whole is 


w,(s) = [b(q + ps)J*. (9.16) 
This type of mortality might reflect predation on adults—— 


(b) Mortality of Litters Independently of what happens to other litters, 
each litter survives with probability p and is wiped out with probability q 
= | — p. That is, given an actual litter size €, the effective litter size is € 
with probability p, and 0 with probability g. The p.g.f. of adults in the fol- 
lowing generation is accordingly 


W(s) = [¢ + pd(s)]*. (9.17) 


This type of mortality might reflect predation on juveniles or on nests and 
eggs in the case of birds. 


(c) Mortality of Generations An entire generation survives with proba- 
bility p and is wiped out with probability g. This type of mortality might 
represent environmental catastrophes (e.g., forest fire, flood). The p.g.f. of 
population size in the next generation in this case is 


W(s) = q + pIP(s)F. (9.18) 


All the p.g.f.s (9.16) through (9.18) have the same mean but usually dif- 
ferent variances. 

It is interesting to assess the relative stability of these three models. 
That is, we need to compare the smallest positive roots of %&(s) = s, 
i = 1, 2, 3, which we will denote by s*¥, i = 1, 2, 3, respectively. ' 

We will show by convexity analysis that 


W(s) = y(s) = y(5). 


A function f(x) is convex in x if for every x, and x, and 0 < A < 1, then 
flAx, + U1 — Ax.) S Af(x,) + A — A)f(%,). In particular, the function 
f(s) = Dio p,S* for 0 < s < 1 is convex in s, since for each positive in- 
teger k, [(As,) + (1 — A)s,]}' S As{ + (1 — A)s} for 0 < A, 5,, 5, << 1. Now, 
w(s) = [O(q + ps) < [gh)) + pd(s)¥ = [4 + pd(s)¥ = y,(s), and then 


s* < s* Thus the first model is more stable than the second. 
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Observe further that due to the convexity of f(x) = x‘, x > 0, o,(s) = 
[pd(s) + gq] < p[ f(s) + gq X 1* = &,(s), and thus s} < s¥, implying that 
the second model is more stable than the third model. In conjunction we 
get the ordering sf < s} << sj. 


Exercises 


9.1. Suppose that the offspring distribution is Poisson with mean 
A = 1.1. Compute the extinction probabilities u, = Pr{X, = 01X, = 1} for 
n=0,1,...,5. What is u,., the probability of ultimate extinction? 


9.2 Determine the probability generating function for the offspring dis- 
tribution in which an individual either dies, with probability pp, or is re- 
placed by two progeny, with probability p,, where p) + p, = 1. 


9.3. Determine the probability generating function corresponding to the 
offspring distribution in which each individual produces 0 or N direct de- 
scendants, with probabilities p and q, respectively. 


9.4. Let f(s) be the generating function of an offspring random variable 
€. Let Z be a random variable whose distribution is that of €, but condi- 
tional on € > 0. That is, 


Pri{Z=k} =Pr{é=kE>0} fork=1,2,.... 


Express the generating function for Z in terms of ¢. 


Problems 


9.1. One-fourth of the married couples in a far-off society have no chil- 
dren at all. The other three-fourths of families have exactly three children, 
each child equally likely to be a boy or a girl. What is the probability that 
the male line of descent of a particular husband will eventually die out? 


9.2. One-fourth of the married couples in a far-off society have exactly 
three children. The other three-fourths of families continue to have chil- 
dren until the first boy and then cease childbearing. Assume that each 
child is equally likely to be a boy or girl. What is the probability that the 
male line of descent of a particular husband will eventually die out? 
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9.3. Consider a large region consisting of many subareas. Each subarea 
contains a branching process that is characterized by a Poisson distribu- 
tion with parameter A. Assume, furthermore, that the value of A varies 
with the subarea, and its distribution over the whole region is that of a 
gamma distribution. Formally, suppose that the offspring distribution is 
given by 
—-A)\k 
m(KA) = ° fork = 0, 1,.., 


where A itself is a random variable having the density function 
A2°A% 'e 
I'(a) 


where @ and a@ are positive constants. Determine the marginal offspring 
distribution p, = f a(kiA)f(A) da. 


Hint: Refer to the last example of II, Section 4. 


f(A) = for A > 0, 


9.4. Let d(s) = 1 — p(1 — s)%, where p and £ are constants with 0 < p, 
B <1. Prove that G(s) is a probability generating function and that its 
iterates are 


o,(s) = 1 — pi*t8e*8""(1 — 58" forn=1,2,.... 


9.5. Attime 0, a blood culture starts with one red cell. At the end of one 
minute, the red cell dies and is replaced by one of the following combi- 
nations with the probabilities as indicated: 


2 red cells 
1 red, 1 white 
2 white 


Tol eo lhe Sat 


Each red cell lives for one minute and gives birth to offspring in the same 
way as the parent cell. Each white cell lives for one minute and dies with- 
out reproducing. Assume that individual cells behave independently. 


(a) At time n + 3 minutes after the culture begins, what is the proba- 
bility that no white cells have yet appeared? 

(b) What is the probability that the entire culture eventually dies out 
entirely? 
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9.6. Let d(s) = as’ + bs + c, where a, b, c are positive and #(1) = 1. 
Assume that the probability of extinction is u.,,, where 0 < u, < 1. Prove 
that u.. = cla. 


9.7. Families in a certain society choose the number of children that 
they will have according to the following rule: If the first child is a girl, 
they have exactly one more child. If the first child is a boy, they continue 
to have children until the first girl and then cease childbearing. Let & be 
the number of male children in a particular family. What is the generating 
function of €? Determine the mean of € directly and by differentiating the 
generating function. 


9.8. Consider a branching process whose offspring follow the geomet- 
ric distribution p, = (1 — c)c‘ fork = 0,1,..., where 0 < c < 1. Deter- 
mine the probability of eventual extinction. 


9.9. One-fourth of the married couples in a distant society have no chil- 
dren at all. The other three-fourths of families continue to have children 
until the first girl and then cease childbearing. Assume that each child is 
equally likely to be a boy or girl. 


(a) Fork = 0, 1, 2,..., what is the probability that a particular hus- 
band will have k male offspring? 

(b) What is the probability that the husband’s male line of descent will 
cease to exist by the 5th generation? 


9.10. Suppose that in a branching process the number of offspring of an 
initial particle has a distribution whose generating function is f(s). Each 
member of the first generation has a number of offspring whose distribu- 
tion has generating function g(s). The next generation has generating 
function f, the next has g, and the distributions continue to alternate in this 
way from generation to generation. 


(a) Determine the extinction probability of the process in terms of f(s) 
and g(s). 

(b) Determine the mean population size at generation n. 

(c) Would any of these quantities change if the process started with the 
g(s) process and then continued to alternate? 


Chapter IV 


The Long Run Behavior of 
Markov Chains 


1. Regular Transition Probability Matrices 


Suppose that a transition probability matrix P = IP. | on a finite number of 
states labeled 0, 1,...,N has the property that when raised to some 
power k, the matrix P* has all of its elements strictly positive. Such a tran- 
sition probability matrix, or the corresponding Markov chain, is called 
regular. The most important fact concerning a regular Markov chain is the 
existence of a limiting probability distribution 7 = (7%, 77,..., Ty), 
where 7, > 0 forj = 0, 1,..., Nand, a, = 1, and this distribution is 
independent of the initial state. Formally, for a regular transition proba- 
bility matrix P = |P,| we have the convergence 


lim Pi) = 17, > 0 forj =0,1,...,N, 


nase 


or, in terms of the Markov chain {X,}, 
lim Pr{X, = j|X) = i} = 7 > 0 forj =0,1,...,N. 
This convergence means that, in the long run (n > ©), the probability 


of finding the Markov chain in state j is approximately 7, no matter in 
which state the chain began at time 0. 
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Example The Markov chain whose transition probability matrix is 
0 ] 
=i 15) 
Pril bp 1-6 (1.1) 


is regular when 0 < a, b < 1, and in this case the limiting distribution is 
a = (b/(a + b), al(a + b)). To give a numerical example, we will suppose 
that 


33 (0.67 | 


0. 
P= | 0.75 0.25 


The first several powers of P are given here: 


5 | 0.6114 0.3886 Pp? = | 0.4932 oaaey| 
~ 10.4350 0.56501 ~ 10.5673 0.43271” 
pt = | 0.5428 peed ps = | 0.5220 0.4780 
~ 10.5117 0.48831? ~ 10.5350 0.45601)’ 
ps = | 0.5307 spinel | p’ = | 0.5271 aoe: 


0.5253 0.4747 0.5294 0.4706 


By n = 7, the entries agree row-to-row to two decimal places. The limit- 
ing probabilities are b(a + b) = 0.5282 and a/(a + b) = 0.4718. 


Example Sociologists often assume that the social classes of succes- 
sive generations in a family can be regarded as a Markov chain. ‘Thus, the 
occupation of a son is assumed to depend only on his father’s occupation 
and not on his grandfather’s. Suppose that such a model is appropriate and 
that the transition probability matrix is given by 


Son’s Class 
Lower Middle Upper 
Father's LOW [1040 0.50 0.10 
Class 0.05 0.70 0.25 
0.05 0.50 0.45 
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For such a population, what fraction of people are middle class in the long 


run? 


For the time being, we will answer the question by computing suffi- 
ciently high powers of P” A better method for determining the limiting 


distribution will be presented later in this section. 


We compute 
0.1900 0.6000 0.2100 
P? = P X P = ||0.0675 0.6400 0.2925)), 
0.0675 0.6000 0.3325 
0.0908 0.6240 0.2852 
P? = P X P = ||0.0758 0.6256 0.2986}, 
0.0758 0.6240 0.3002 
0.0772 0.6250 0.2978 
Pp’ = P* X P* =||0.0769 0.6250 0.2981}, 
0.0769 0.6250 0.2981 


Note that we have not computed P” for consecutive values of n but have 
speeded up the calculations by evaluating the successive squares P’, 
P*, P®. 

In the long run, approximately 62.5 percent of the population are mid- 
dle class under the assumptions of the model. 

Computing the limiting distribution by raising the transition probability 
matrix to a high power suffers from being inexact, since n = © is never 
attained, and it also requires more computational effort than 1s necessary. 
Theorem 1.1 provides an alternative computational approach by asserting 
that the limiting distribution is the unique solution to a set of linear equa- 
tions. For this social class example, the exact limiting distribution, com- 
puted using the method of Theorem 1.1, is 7 = 4 = 0.0769, 7, = 3 = 
0.6250, and 7, = #4 = 0.2981. 

If a transition probability matrix P on N states is regular, then P*”’ will 
have no zero elements. Conversely, if P’”’ is not strictly positive, then the 
Markov chain is not regular. Furthermore, once it happens that P* has no 
zero entries, then every higher power P**", n = 1,2, ... , will have no zero 
entries. Thus it suffices to check the successive squares P, P’, P*, P®,.... 
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a 


Finally, to determine whether or not the square of a transition probability 
matrix has only strictly positive entries, it is not necessary to perform the 
actual multiplication, but only to record whether or not the product is 
nonzero. 


Example Consider the transition probability matrix 


0.9 0.1 
0.9 
0.9 
P = || 0.9 
0.9 
0.9 
0.9 


2D OoOoOcCUCcrllCUlCcOlUlCO 
CcCoOoOcCOomolmUC OmUCOcUmrRM COO 
—- - OOCooO 


0. 
0. 


We recognize this as a success runs Markov chain. We record the nonzero 
entries as + and write P X P in the form 


PxXP= 


+ + + 4+ + + + + + + + + «+ «4 
co ooooClUlcrOmUrUHHLCUCOOCCCOCUCOCUCcUOWUUClcOlcOolhUr+t 
as I > > > > o> > > 
Sc oOoocClromUrhHHLCOCUCOUCUCOCOCOUCUCOUCUCOUmUrpWCOUCO 
co ocmUcrmcmUhWHHLUCOCUCOCUCcOUCOOUCCUCUOUlUrHHCUCcOUCcCOClCUC SO 
2aoOlUrHmCUC COlUCUCOCUCOCOCUCOOCUCOOCUCUOUlUrHC OCoOCUCUChlUlO 
+ +o0oe00ee00 +4+400+0o 0 0 


1. Regular Transition Probability Matrices 203 


+ + + 0 0 O O 
+ + 0 + 0 O O 
+ + 0 0 + O O 
=|/+ + 0 0 0 + O} =P’, 
+ + 0 0 0 0 + 
+ + 0 0 0 0 + 
+ + 0 0 0 0 + 
+ + + + + 0 0 
+ + + + 0 + O 
+ + + + 0 0 + 
Pe=|]}+ + + + 0 O FHI, 
+ + + + 0 0 + 
+ + + + 0 0 + 
+ + + + 0 0 + 
+ + + + + + + 
+ + + + + + + 
+ + + + + + + 
PP=i+ + + + + + FYI. 
+ + + + + + + 
+ + + + + + + 
+ + + + + + + 


We see that P* has all strictly positive entries, and therefore P is regular. 
The limiting distribution for a similar matrix 1s computed in Section 2.2. 

Every transition probability matrix on the states 0, 1,..., N that satis- 
fies the following two conditions is regular: 


1. For every pair of states i, j there is a path k,,..., k, for which F,, 
Pay Bs > 0. 
2. There is at least one state i for which P, > 0. 
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Theorem 1.1 Let P be a regular transition probability matrix on the 
states 0,1,...,N. Then the limiting distribution @ = (7, 77,,... Ty) is 
the unique nonnegative solution of the equations 


N 

T= > ™Py, jf=0,1,...,N, (1.2) 
k=0 

N 

> ™ = 1. (1.3) 


Proof Because the Markov chain is regular, we have a limiting distrib- 
ution, lim,,_,. P= 7,, for which 3;-5 ™ = 1. Write P” as the matrix prod- 
uct P”"'P in the form 


N 
p= 2, Poop, =f =0,...,N, (1.4) 


and now let n > ©. Then P? > a, while P;’"" — 7,, and (1.4) passes 
into 7, = DYi-o Pj as claimed. 

It remains to show that the solution is unique. Suppose that xo, x,,..., 
Xy solves 


N 
x= > xP;  forj=0,...,N (1.5) 
k=0 
and 
N 
> %& = 1. (1.6) 
k=0 


We wish to show that x; = 77,, the limiting probability. Begin by multi- 
plying (1.5) on the right by P, and then sum over j to get 


N N 
du 1 = > ie x PB, = >, %Ph. (1.7) 
But by (1.5) we have x, = >/_, x;P,, whence (1.7) becomes 
N 
x; => x,P? for] =0,...,N. 
k=0 
Repeating this argument n times we deduce that 
N 
x= > % PY for! =0,...,N, 
k=0 


1. Regular Transition Probability Matrices 205 


and then passing to the limit in n and using that P{? — 77, we see that 
N 
x= >. %TM, [=0,...,N. 
k=0 


But by (1.6) we have >, x, = 1, whence x, = 7, as claimed. 


Example For the social class matrix 


0 2 
0||0.40 0.50 0.10 
P=11/0.05 0.70 0.25], 
2110.05 0.50 0.45 


the equations determining the limiting distribution (7, 77,, 77) are 


0.407, + 0.057, + 0.0577, = 1, (1.8) 
0.507, + 0.707, + 0.5077, = 7,, (1.9) 
0.107, + 0.257, + 0.457, = 7,, (1.10) 

TM + 7, + 7, = 1. (1.11) 


One of the equations (1.8), (1.9), and (1.10) is redundant, because of the 
linear constraint 2, P, = 1. We arbitrarily strike out (1.10) and simplify 
the remaining equations to get 


—607, + S57, + Sa, = 0, (1.12) 
STM ~~ 377, + 571, = 0, (1.13) 
TM+7+ m= 1. (1.14) 


We eliminate 77, by subtracting (1.12) from (1.13) and five times (1.14) to 
reduce the system to 


Then 7 = & = 4, 7, = 2, and then 77, = 1 — m — 7, = 7, as given earlier. 
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A transition probability matrix is called doubly stochastic if the columns 
sum to one as well as the rows. Formally, P = [P| is doubly stochastic if 
P,=0 and > P,=)> BR, =1 foralli,j. 

k k 

Consider a doubly stochastic transition probability matrix on the N 
states 0, 1,...,N — 1. If the matrix is regular, then the unique limiting 
distribution is the uniform distribution m = (1/N,..., 1/N). Because there 
is only one solution to 7, = 2%, 7,P,; and 2, 7, = 1 when P is regular, we 
need only check that 7 = (1/N, ..., 1/N) is a solution where P is doubly 
stochastic in order to establish the claim. By using the doubly stochastic 
feature 2, P, = 1 we verify that 


J 


As an example, let Y,, be the sum of n independent rolls of a fair die and 
consider the problem of determining with what probability Y, is a multi- 
ple of 7 in the long run. Let X,, be the remainder when Y, is divided by 7. 
Then X,, is a Markov chain on the states 0, 1, .. . , 6 with transition prob- 
ability matrix 


O12 3 4 5 6 
oo ss eee | 
ifs Ob 8 bb g 
2s 20g gb g 
P=3ifi § f 0 8 4b 3 
4s sb bog | 
ss § § $40 8 


The matrix is doubly stochastic, and it is regular (P’ has only strictly pos- 
itive entries), hence the limiting distribution is 7 = (3,... , 4). Further- 
more, Y, is a multiple of 7 if and only if X, = 0. Thus the limiting proba- 
bility that Y, is a multiple of 7 is 5. 
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1.2 Interpretation of the Limiting Distribution 


Given a regular transition matrix P for a Markov process {X,} on the 
N + 1 states 0, 1,..., NN, we solve the linear equations 


yon P., fori=0,1,...,N 


and 
T+ Tt +t m= 1. 


The primary interpretation of the solution (7, . .., ay) is as the limiting 
distribution 


m = lim P§) = lim Pr{X, = j|Xo = i}. 


In words, after the process has been in operation for a long duration, the 
probability of finding the process in state j is 7, irrespective of the start- 
ing state. 

There is a second interpretation of the limiting distribution 7 = 
(7%, 7,,..., Wy) that plays a major role in many models. We claim that 
7, also gives the long run mean fraction of time that the process {X,,} is in 
state 7. Thus if each visit to state j incurs a “cost” of c,, then the long run 
mean cost per unit time associated with this Markov chain 1s 


Long run mean cost per unit time = >» 11; C;. 
j=0 
To verify this interpretation, recall that if a sequence a), a), .. . of real 
numbers converges to a limit a, then the averages of these numbers also 
converge in the manner 


m— 1 


im > a, = a. 


mae PY k= 


We apply this result to the convergence lim,_,.. P? = 7, to conclude that 


m—1 
se (k) — 
lim >» Pi) = 17. 


max FP k=0 
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a 


Now, (1/m) Yizo P® is exactly the mean fraction of time during steps 
O,1,...,m—1 that the process spends in state j. Indeed, the actual (ran- 
dom) fraction of time in state j is 


m-—-1 


=> 1{X, = 


where 
. | if X, = j, 
1h, = 7} ={, if X, #j. 


Therefore the mean fraction of visits is obtained by taking expected val- 
ues according to 


l m— l m-—-1 
A > 1{X, = j}|X = i =—) EX, =sX = i 
m M K=0 


=+ >, Pr{X, = j|Xo = i} 
Mm ~=0 


Because lim,_,.. P{? = a, the long run mean fraction of time that the 
process spends in state 7 is 


m—1 m-1 
lim g|— >» 1{X, = j}|X, = i = lim — 2, PO= 


ms Mm ,=0 mo Wn k= 


independent of the starting state i. 


Exercises 


1.1. A Markov chain X), X,, X,,... has the transition probability matrix 


O 1 2 

010.7 0.2 0.1 
P=1]| 0 06 0.4). 

2105 O 0.5 


Determine the limiting distribution. 
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1.2. A Markov chain X,, X,, X>,... has the transition probability matrix 


0 1 2 

01}0.6 03 0.1 
P=1)/03 03 0.41]. 

2 |{|0.4 0.1 O05 


Determine the limiting distribution. 


1.3. A Markov chain X), X,, X2,... has the transition probability matrix 


0 1 2 

010.1 0.1 08 
P=1)]/0.2 0.2 0.6]]. 

21}0.3 0.3 04 


What fraction of time, in the long run, does the process spend in state 1? 


1.4. A Markov chain Xp, X,, X>,.. . has the transition probability matrix 


O 1 2 

01/03 0.2 0.5 
P=1)/05 0.1 0.41). 

2|{0.5 0.2 0.3 


Every period that the process spends in state 0 incurs a cost of $2. Every 
period that the process spends in state 1 incurs a cost of $5. Every period 
that the process spends in state 2 incurs a cost of $3. What is the long run 
cost per period associated with this Markov chain? 


1.5. Consider the Markov chain whose transition probability matrix is 
given by 


0 1 2 
O07} 0.1 O05 O 04 
1 0 0 1 0 
Pp = 
2 0 0 0 
3 l 0 0 0 


Determine the limiting distribution for the process. 
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a 


1.6. Compute the limiting distribution for the transition probability 


matrix 
2 


0 


Nie 
eo) ees Cee ed ay 


wafi— wai 


1.7. A Markov chain on the states 0, 1, 2, 3 has the transition probabil- 
ity matrix 


0 ] 2 3 
0}, 01 02 03 04 
1] 0 03 03 04 
2 0 0 06 04 
3 ] 0 0 0 


Determine the corresponding limiting distribution. 


P = 


1.8. Suppose that the social classes of successive generations in a fam- 
ily follow a Markov chain with transition probability matrix given by 


Son’s Class 
Lower Middle Upper 
wer 0.7 0.2 0.1 
Class 0.2 0.6 0.2 
0.1 0.4 0.5 


What fraction of families are upper class in the long run? 


1.9. Determine the limiting distribution for the Markov chain whose 
transition probability matrix is 


O 1 2 
Ol|: 3 0 
Pails 3 3 
20 3 3 


1.10. A bus in a mass transit system is operating on a continuous route 
with intermediate stops. The arrival of the bus at a stop is classified into 
one of three states, namely 
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1. Early arrival; 

2. On-time arrival; 

3. Late arrival. 
Suppose that the successive states form a Markov chain with transition 
probability matrix 


1 2 3 

L105 0.4 0.1 
P=2)/0.2 O05 0.3}. 

3 |}0.1 0.2 0.7 


Over a long period of time, what fraction of stops can be expected to be late? 


Problems 


1.1. Five balls are distributed between two urns, labeled A and B. Each 
period, an urn is selected at random, and if it is not empty, a ball from that 
urn is removed and placed into the other urn. In the long run, what frac- 
tion of time is urn A empty? 


1.2. Five balls are distributed between two urns, labeled A and B. Each 
period, one of the five balls is selected at random, and whichever urn it’s 
in, it is moved to the other urn. In the long run, what fraction of time is 
umn A empty? 


1.3. A Markov chain has the transition probability matrix 


Olla, @ @ GQ As 
lil] 0 0 0 0 90 
270 1:0 0 0 0 
© silo 0 1 0 0 of 
470 0 0 1 0 O 
510 0 0 0 1 =O 
where a, = 0,i = 1,..., 6, and a, + -:: + a = 1. Determine the lim- 


iting probability of being in state 0. 


212 IV The Long Run Behavior of Markov Chains 


1.4. A finite state regular Markov chain has transition probability matrix 
P = |P,| and limiting distribution a = [|. In the long run, what fraction 
of the transitions are from a presribed state k to a prescribed state m? 


1.5. The four towns A, B, C, and D are connected by railroad lines as 
shown in the following diagram: 


Figure 1.1. A graph whose nodes represent towns and whose arcs represent 
railroad lines. 


Each day, in whichever town it is in, a train chooses one of the lines out 
of that town at random and traverses it to the next town, where the process 
repeats the next day. In the long run, what is the probability of finding the 
train in town D? 


1.6. Determine the following limits in terms of the transition probabil- 
ity matrix P = IP, | and limiting distribution 7 = |x| of a finite state reg- 
ular Markov chain {X,}: 


(a) lim,_.. Pr{X,4, = j|Xo = i}. 
(b) lim,.,.. Pr(X, = k, Xyo1 = j|Xo = i. 
(c) lim,,_,. Pr{X,,_ = k, Xx, — Xo a i}. 


1.7. Determine the limiting distribution for the Markov chain whose 
transition probability matrix is 


0 1 2 3 
Ol: 0 0: 
lil O O O 
P = 

20 $4 3 
30 0 1 QO 
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1.8. Show that the transition probability matrix 


0123 4 
o10 4 ! 0 0 
14 0 £00 
P=2]); 3 0 3; 0 
310 0 4 0 3 
41: 00 : 0 


is regular and compute the limiting distribution. 


1.9. Determine the long run, or limiting, distribution for the Markov 
chain whose transition probability matrix is 


0 1 2 3 
O1}0 O 1 O 
170 0 0 1 
eS 2it bob 4 
3llo o : 3 


1.10. Consider a Markov chain with transition probability matrix 


Po Pi, P2 ‘** ~— Py 
Pw Po Pi °°" Pye 

P= || Pyv-1 Py Po °°: Pw-2|} 
P P2 Ps °°" Po 


where 0 < py < 1 and py + p, + °°: + py = 1. Determine the limiting dis- 
tribution. 


1.11. Suppose that a production process changes state according to a 
Markov process whose transition probability matrix is given by 
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Oo 1 2 3 


0}; 03 0.5 0 0.2 
1705 02 02 0.1 
21,02 03 04 0.1 
301 02 04 03 


It is known that 7, = #2 = 0.3140 and z7, = 4% = 0.2137. 


(a) Determine the limiting probabilities 77 and 77. 

(b) Suppose that states 0 and 1 are “In-Control” while states 2 and 3 are 
deemed “Out-of-Control.” In the long run, what fraction of time is 
the process Out-of-Control? 

(c) In the long run, what fraction of transitions are from an In-Control 
state to an Out-of-Control state? 


1.12 Let P be the transition probability matrix of a finite state regular 
Markov chain, and let II be the matrix whose rows are the stationary dis- 
tribution 7. Define Q = P — II. 


(a) Show that P” = II + Q". 
(b) When 


Nie NI 
Ni—- si © 


Oo st ve 


1 
2 
obtain an explicit expression for Q” and then for P’. 


1.13. A Markov chain has the transition probability matrix 


0 1 2 

01/04 04 0.2 
P=1/106 0.2 0.2)]. 

2104 0.2 0.4 
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After a long period of time, you observe the chain and see that it is in 
state 1. What is the conditional probability that the previous state was state 
2? That is, find 


lim Pr{X,_, = 2|X, = 1}. 


ux 


2. Examples 


Markov chains arising in meteorology, reliability, statistical quality con- 
trol, and management science are presented next, and the long run behav- 
ior of each Markov chain is developed and interpreted in terms of the phe- 
nomenon under study. 


2.1. Including History in the State Description 


Often a phenomenon that is not naturally a Markov process can be mod- 
eled as a Markov process by including part of the past history in the state 
description. To illustrate this technique, we suppose that the weather on 
any day depends on the weather conditions for the previous two days. To 
be exact, we suppose that if it was sunny today and yesterday, then it will 
be sunny tomorrow with probability 0.8; if it was sunny today but cloudy 
yesterday, then it will be sunny tomorrow with probability 0.6; if it was 
cloudy today but sunny yesterday, then it will be sunny tomorrow with 
probability 0.4; if it was cloudy for the last two days, then it will be sunny 
tomorrow with probability 0.1. 

Such a model can be transformed into a Markov chain, provided we say 
that the state at any time is determined by the weather conditions during 
both that day and the previous day. We say the process is in 


State (S, 5) if it was sunny both today and yesterday, 

State (S, C) if it was sunny yesterday but cloudy today, 
State (C, S) if it was cloudy yesterday but sunny today, 
State (C, C) if it was cloudy both today and yesterday. 


Then the transition probability matrix is 
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Today’s State 
(S,5) (S,C) (CS) (C,C) 
(S, 5) |] 0.8 0.2 


Yesterday’s (5S, C) 0.4 0.6 
State (C, S) |} 0.6 0.4 
(C, C) 0.1 0.9 


The equations determining the limiting distribution are 


0.877 + 0.677, = %, 
0.27 + 0.477, = 7, 
0.47, + 0.17; = 7,, 
0.677, 0.97, = 7, 


T+ m7, + T+ 7, = 1. 


Again, one of the top four equations is redundant. Striking out the first 
equation and solving the remaining four equations gives 7 = 7, 7, = ij. 
qr, = +, and 7, = =. 

We recover the fraction of days, in the long run, on which it is sunny by 
summing the appropriate terms in the limiting distribution. It can be sunny 
today in conjunction with either being sunny or cloudy tomorrow. There- 
fore, the long run fraction of days in which it is sunny is m + 7, = 
a(S, S) + a(S, C) = 4. Formally, lim,_,,. Pr{X, = S} = lim,_,.. [Pr{X, = S, 
X,4, = S} + Pr{X, = S, X,., = C} = am + 7,. 


2.2. Reliability and Redundancy 


An airline reservation system has two computers, only one of which is in 
operation at any given time. A computer may break down on any given 
day with probability p. There is a single repair facility that takes 2 days to 
restore a computer to normal. The facilities are such that only one com- 
puter at a time can be dealt with. Form a Markov chain by taking as states 
the pairs (x, y) where x is the number of machines in operating condition 
at the end of a day and y is 1 if a day’s labor has been expended on a ma- 
chine not yet repaired and 0 otherwise. The transition matrix is 
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To State 
> (2,0) €,9) d,1) ©, 1) 


From State 


(2, 0) 
1,0) 
dd) 

(0, 1) 


oOo 8k Oo 
~~ BS OV 
oOo Oo QQ © 
oO Oo NUM OO 


where p + g = 1. 

We are interested in the long run probability that both machines are in- 
operative. Let (7%, 7, 7%, 773) be the limiting distribution of the Markov 
chain. Then the long run probability that neither computer is operating is 
q7,, and the availability, the probability that at least one computer 1s oper- 
ating, is 1 — 7, = m+ 7 + 7%. 

The equations for the limiting distributions are 


G7 + qT, = TM; 
P7% + pM, + 17; = ™, 
qa, = Th, 
pT, = 1 


and 
TM + 7, + 7, + 7, = 1. 


The solution is 


7 = q — qpP 
° 1 + p?’ ° 1 + p?’ 
T= P TT, = P 
1 + p?’ ; 1+ p? 


The availability is R, = 1 — 7, = 1/(1 + p’). 

In order to increase the system availability, it is proposed to add a du- 
plicate repair facility so that both computers can be repaired simultane- 
ously. The corresponding transition matrix 1s now 
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_— 
To State 
> (2,0) (1,0) G,1) 0,1) 
From State 
(2, 0) q P 0 0 
_ (1, 9) 0 0 q P 
(1, 1) q Pp OO Off 
(0, 1) ) 0 l 0 
and the limiting distribution is 
_ q _ Pp 
Tt = ——, 2 =: 
l+pt+p l+pt+p 
_ P __ Pe 
m=O FO 
l+pt+p l+pt+p 


Thus availability has increased to R, = 1 — 7, = (1 + pV/(1 + p + p’). 


2.3. A Continuous Sampling Plan 


Consider a production line where each item has probability p of being 
defective. Assume that the condition of a particular item (defective or 
nondefective) does not depend on the conditions of other items. The fol- 
lowing sampling plan is used. 

Initially every item is sampled as it is produced; this procedure contin- 
ues until i consecutive nondefective items are found. Then the sampling 
plan calls for sampling only one out of every r items at random until a de- 
fective one is found. When this happens the plan calls for reverting to 100 
percent sampling until i consecutive nondefective items are found. The 
process continues in the same way. 

State E, (k = 0, 1,...,2— 1) denotes that k consecutive nondefective 
items have been found in the 100 percent sampling portion of the plan, 
while state E; denotes that the plan is in the second stage (sampling one 
out of r). Time m is considered to follow the mth item, whether sampled 
or not. Then the sequence of states is a Markov chain with 
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P, = Pr{in state E, after m + 1 items|in state E, after m items} 


iy fork = 0,0 Sj <i, 
l1—p fork =j+1 Si, 
P fork = 0,j =i, 
=(r 
1-7 fork =j =i, 
r 
0 otherwise. 


Let 7, be the limiting probability that the system is in state E, for 
k=0,1,..., 4%. The equations determining these limiting probabilities are 


(0) pm + pm te t+ p7;_, + (p/r) 17; = To, 
(1) (1 — p)™ = 7, 
(2) (1 — p)7, = TM, 
(i) (1 — p)m-. + 1 — pir)a, = 77, 
together with 

(*) Mt+tmt+: + 7= 1. 


From equations (1) through (i) we deduce that 7, = (1 — p)7;,_,, so that 
™, = (1 — p)‘m fork = 0,...,i— 1, while equation (i) yields 7, = 
(r/p)(1 — p)7;-, or 7; = (r/p)(1 — p)'m. Having determined 77, in terms 
of m fork = 0,..., i, we place these values in (*) to obtain 


ft tree + (1 — py] + me — py} = 1. 
The geometric series simplifies, and after elementary algebra the solution 


1S 


P 


7 1 +(r—1)0 — py’ 
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whence 


p(1 — p)* 


Tm 1+ (r— 10 — py 


fork =0,...,i-—1, 


while 


_ r(1 — p)' 
7 1+ (r= I= py 


Let AFI (Average Fraction Inspected) denote the long run fraction of 
items that are inspected. Since each item is inspected while in states 
E,, ..., £;-, but only one out of r is inspected in state E;, we have 


AFI = (m + °°: + 7,-,) + U/n7, 


= (1 — m) + C/r)7, 
_ 1 
— 1t+(r-D0d- py 


Let us assume that each item found to be defective is replaced by an 
item known to be good. The average outgoing quality (AOQ) is defined to 
be the fraction of defectives in the output of such an inspection scheme. 
The average fraction not inspected is 


__ 1 = py 
PARIS T+ (r= DO = py 


and of these on the average p are defective. Hence 


AO = (r — 1) — p)p ; 
1+ (r— 1)(1 — p) 

This average outgoing quality is zero if p = 0 or p = 1, and mises to a max- 
imum at some intermediate value, as shown in Figure 2.1. The maximum 
AOQ 1s called the average outgoing quality limit (AOQL), and it has been 
determined numerically and tabulated as a function of i and r. 

This quality control scheme guarantees an output quality better than the 
AOQL regardless of the input fraction defective, as shown in Figure 2.2. 
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Figure 2.1 The average outgoing quality (AOQ) as a function of the input 
quality level p 


INPUT CONTINUOUS OUTPUT 
SAMPLING 
Arbitrary Quality INSPECTION Guaranteed Quality 


p AOQL 


Figure 2.2 Black-box picture of a continuous inspection scheme as a method 
of guaranteeing outgoing quality 


2.4. Age Replacement Policies 


A component of a computer has an active life, measured in discrete units, 
that is a random variable 7, where Pr[T = k] = a, for k = 1, 2,....Sup- 
pose one starts with a fresh component, and each component is replaced 
by a new component upon failure. Let X,, be the age of the component in 
service at time n. Then (X,,) is a success runs Markov chain. (See III, Sec- 
tion 5.4.) 

In an attempt to replace components before they fail in service, an age 
replacement policy is instituted. This policy calls for replacing the com- 
ponent upon its failure or upon its reaching age N, whichever occurs first. 
Under this age replacement policy, the Markov chain {X,,} has the transi- 
tion probability matrix 
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0 1 2 3 N-1 


0 Po 1-Po 0 0 vee 0 
DP 0 1 —p, 0 0 
P= 2 Pr '@) '@) 1 —p, oe '@) , 
N- 1) 1 0 0 0 0 
where 
Pp, = 4+. — fork =0,1,...,N-— 2. 


Ans) F Ayyy Fores 


State 0 corresponds to a new component, and therefore the limiting 
probability 77 corresponds to the long run probability of replacement dur- 
ing any single time unit, or the long run replacement per unit time. Some 
of these replacements are planned or age replacements, and some corre- 
spond to failures in service. A planned replacement occurs in each period 
for which X, = N — 1, and therefore the long run planned replacements 
per unit time is the limiting probability 7,_,. The difference 7 — 7y_, 1s 
the long run rate of failures in service. The equations for the limiting dis- 


tribution 7 = (1%, 77,,..., Wy_;) are 
Po + Pym tee + Py-2Ty-2 + Ty-1 = Mo, 
(1 — Po) 7% = 7; 
(1 — p77, = T™, 
(1 — Py-2)Ty-2 = Tn-1s 
T + 7, + ::: + ty_, = 1. 


Solving in terms of 7, we obtain 
My = TM, 
7, = (1 — po) Mo, 
m™ = (1 — py), = (1 — p)1 — po), 
TH = (CL = pee) M1 = CA — Pye. = Py-2) +** C1 = Po) Mo: 


Try) = (1 — Py-2)Ty-2 = (1 — py-2)(1 — py-3) *°* C1 = po), 
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and since 7% + m7, +++: + my-, = 1, we have 
1=[1+(—p) + - pd - py) +o 
+ (1 — pol — p,)°+* C= Py-2)) 


or 


1 


] 
1+ -p)+d-pdMl-—p)+to> +0 - pd — pds: — Py-2) 


If A; =a, + a,,, + ++: forj = 1,2,..., where A, = 1, then p, = a,,/Aj., 
and 1 — p, = A;,.,/A;.,, which simplifies the expression for 71 to 


] 


Ty =F 
° A, + Ay +e + Ay 


and then 
Ay 


i 


7, = Ay, = <=... 
Nop ENO A, FA, Hee FAY 


In practice, one determines the cost C of a replacement and the addi- 
tional cost K that is incurred when a failure in service occurs. Then the 
long run total cost per unit time is C7 + K(a — 7y-,), and the replace- 
ment age N is chosen so as to minimize this total cost per unit time. 

Observe that 


1 N N-1 
—=A, +A, +°°°+A,= » Pr{T = 7} = » Pr{T > k} 
TM j=l k=0 


lI 
[8 


Pr{min{7, N} > k} = E[min{T, N}]. 


= 
I| 
S 


In words, the reciprocal of the mean time between replacements 
E[min{T, N}] yields the long run replacements per unit time 7. This re- 
lation will be further explored in the chapter on renewal processes. 


2.5. Optimal Replacement Rules 


A common industrial activity is the periodic inspection of some system as 
part of a procedure for keeping it operative. After each inspection, a deci- 
sion must be made whether or not to alter the system at that time. If the 
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inspection procedure and the ways of modifying the system are fixed, an 
important problem is that of determining, according to some cost criterion, 
the optimal rule for making the appropriate decision. Here we consider the 
case in which the only possible act is to replace the system with a new one. 

Suppose that the system is inspected at equally spaced points in time 
and that after each inspection it is classified into one of the L + 1 possi- 
ble states 0, 1,..., L. A system is in state 0 if and only if it is new and is 
in state L if and only if it is inoperative. Let the inspection times be n = 
0, 1,..., and let X, denote the observed state of the system at time n. In 
the absence of replacement we assume that {X,,} 1s a Markov chain with 
transition probabilities p, = Pr{X,., = |X, = i} for all i, j, and n. 

It is possible to replace the system at any time before failure. The mo- 
tivation for doing so may be to avoid the possible consequences of further 
deterioration or of failure of the system. A replacement rule, denoted by 
R, is a specification of those states at which the system will be replaced. 
Replacement takes place at the next inspection time. A replacement rule R 
modifies the behavior of the system and results in a modified Markov 
chain {X,(R); n = 0, 1, ...}. The corresponding modified transition prob- 
abilities p,(R) are given by 


p,(R) = p, if the system is not replaced at state i, 
Pio(R) = 1, 

and 
p,(R) = 0, J#O if the system is replaced at state i. 


It is assumed that each time the equipment is replaced, a replacement 
cost of K units is incurred. Further it is assumed that each unit of time the 
system is in state j incurs an operating cost of a,. Note that a, may be in- 
terpreted as failure (inoperative) cost. This interpretation leads to the one 
period cost function c,(R) given fori = 0,..., L by 


_ a; if Djo(R) — 0, 
ct) = lk +a, ifp,(R) = 1. 


We are interested in replacement rules that minimize the expected long 
run time average cost. This cost is given by the expected cost under the 
limiting distribution for the Markov chain {X,(R)}. Denoting this average 
cost by #(R), we have 
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P(R) = >, 7(R)c(R), 
where 77,(R) = lim,,_,.. Pr{X,(R) = i}. The limiting distribution 7;(R) is de- 
termined by the equations 

L 


mR) = >, m(R)pAR), 1 =0,...,L, 


and 
a(R) + 77,(R) + +++ + 7,(R) = 1. 
We define a control limit rule to be a replacement rule of the form 
“Replace the system if and only if X, = k,” 


where k, called the control limit, 1s some fixed state between 0 and L. We 
let R, denote the control limit rule with control limit equal to k. Then R, is 
the rule “Replace the system at every step,” and R, is the rule “Replace 
only upon failure (State L).” 

Control limit rules seem reasonable provided that the states are labeled 
monotonically from best (0) to worst (L) in some sense. Indeed, it can be 
shown that a control limit rule is optimal whenever the following two con- 
ditions hold: 


(1) a Sa,S°°' Sa. 


L L 

(2) If i Sj, then > Pim = >» P;, for every k =0,...,L. 

m=k m=k 
Condition (1) asserts that the one stage costs are higher in the “worse” 
states. Condition (2) asserts that further deterioration is more likely in the 
“worse” states. 

Let us suppose that conditions (1) and (2) prevail. Then we need only 
check the L + 1 control limit rules Ro, ... , R, in order to find an optimal 
rule. Furthermore, it can be shown that a control limit k* satisfying 
P(R,»_,) 2 O(R,-) = H(R,+,,) iS optimal, so that not always do all L + 1 
control limit rules need to be checked. 

Under control limit k we have the cost vector 


c(R,) = (@,...,QG-,,K ta,...,K + a,), 


and the transition probabilities 
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k-] k L 


0 0 Por Pox-} Pox oe Po 
O Pi Pri Pr ut Pu 

PCR) =kK- WO pea De-ve-y Pk-ik °° Pr-a]]- 
L l 0 0 0 0 


To look at a numerical example, we will find the optimal control limit 
k* for the following data: L = 5 and 
] 2 3 4 =«5 
0.2 0.2 0.2 0.2 0.2 
0.1 02 02 0.2 0.3 
0.2 03 0.4 
0 O 01 04 O.5 
0 O O 04 0.6 
0 O O 0 ] 


? 


Co olUlmmOmllCUCOCOlCOCOCOD 
© 
fn) 
—" 


0 
l 
2 
3 
4 
5 


Ay =*** =a,_, = 0,a, = 5, and K = 3. When k = |, the transition matrix 


t 2 3 4 #5 
0.2 02 0.2 0.2 0.2 
0 0 O O 


© 


P(R|) = 


—=— —_— — — m= Ch © 


nm BW NY — © 
oO oOo O&O © 
oO oOo O&O © 
oO o& O&O © 
oO Oo O&O & 
oO o& O&O © 
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which implies the following equations for the stationary distribution: 


7, + + 7; + 1, + Ts = TM, 


0.27% = 7, 
0.27% = Th, 
0.27% = 73, 
0.27%, = 14, 
0.27% = Ts, 


7 + 7, + 7, + 7, + 7, + 7; = 1. 
The solution is 7 = 0.5 and 77, = 7, = 7, = 7, = 7, = 0.1. The aver- 
age cost associated with k = 1 is 
gd, = 0.5(0) + 0.1(3) + 0.1(3) + 0.1(3) + 0.1(3) + 0.1(3 + 5) 
= 2.0. 


When k = 2, the transition matrix is 


1 2 3 4 55 
02 0.2 02 02 0.2 
0.1 0.2 02 02 0.3 

0 0 O 9O 0 


—_— —_=_ = —= CH OO C&S 


0 0 O O||’ 
0 0 O 0 
0 0 0 0 


oO Oo & 


and the associated stationary distribution is 7% = 0.450, a, = 0.100, 7, = 
7, = 17, = 0.110, 7, = 0.120. We evaluate the average cost to be 


g@, = 0.45(0) + 0.10(0) + 0.11(3) + 0.11(3) + 0.11(3) + 0.12(8) 
= 1.95. 


Continuing in this manner, we obtain the following table: 
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Control Average 

Limit Stationary Distribution Cost 
k ur Wy Th Th; ur Ts dy 
1 0.5000 0.1000 90.1000 0.1000 0.1000 0.1000 2.0000 
2 0.4500 0.1000 90.1100 0.1100 0.1100 0.1200 1.9500 
3 0.4010 0.0891 0.1089 0.1198 0.1307 0.1505 1.9555 
4 0.3539 0.9786 0.0961 0.1175 0.1623 0.1916 2.0197 
5 0.2785 0.06189 0.0756 0.0925 0.2139 0.2785 2.2280 


The optimal control limit is k* = 2, and the corresponding minimum 
average cost per unit time is @, = 1.95. 


Exercises 


2.1. Ona Southern Pacific island, a sunny day is followed by another 
sunny day with probability 0.9, whereas a rainy day is followed by an- 
other rainy day with probability 0.2. Supposing that there are only sunny 
or rainy days, in the long run on what fraction of days is it sunny? 


2.2. Inthe reliability example of Section 2.2, what fraction of time is the 
repair facility idle? When a second repair facility is added, what fraction 
of time is each facility idle? 


2.3. Determine the average fraction inspected, AFI, and the average out- 
going quality, AOQ, of Section 2.3 for p = 0, 0.05, 0.10, 0.15, ... , 0.50 
when 


(a) r= 10 andi = 5S. 
(b) r= 5 andi = 10. 


2.4. Section 2.2 determined the availability R of a certain computer sys- 
tem to be 


] 
174 P for one repair facility, 
Il+p are 
, = ——— for two repair facilities, 
l+pt+p 


where p is the computer failure probability on a single day. Compute and 
compare R, and R, for p = 0.01, 0.02, 0.05, and 0.10. 
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2.5. From purchase to purchase, a particular customer switches brands 
among products A, B, and C according to a Markov chain whose transi- 
tion probability matrix is 
A B C 
A ||[0.6 0.2 0.2 
P=B]/0.1 0.7 0.2]]. 
Cilo.l 0.1 08 


In the long run, what fraction of time does this customer purchase brand 
A? 


2.6. Acomponent of a computer has an active life, measured in discrete 
units, that is a random variable T where 


Pr{T = 1} = 0.1, Pr{T = 3} = 0.3, 
Pr{T = 2} = 0.2, Pr{T = 4} = 0.4. 


Suppose one starts with a fresh component, and each component is re- 
placed by a new component upon failure. Determine the long run proba- 
bility that a failure occurs in a given period. 


2.7. Consider a machine whose condition at any time can be observed 
and classified as being in one of the following three states: 

State 1: Good operating order 

State 2: Deteriorated operating order 

State 3: In repair 


We observe the condition of the machine at the end of each period in a se- 
quence of periods. Let X, denote the condition of the machine at the end 
of period n for n = 1, 2,.... Let X, be the condition of the machine at the 
start. We assume that the sequence of machine conditions is a Markov 
chain with transition probabilities 


P,, = 0.9, P,, = 0.1, P,, = 0, 
P,, = 0, P,, = 0.9, P,, = 0.1, 
Py, a 1, Py _ 0, Py, = 0, 


and that the process starts in state X, = 1. 


230 IV The Long Run Behavior of I Markov Chains 


(a) Find Pr{X, = 1}. 
(b) Calculate the limiting distribution. 
(c) What is the long run rate of repairs per unit time? 


2.8. At the end of a month, a large retail store classifies each receivable 
account according to 


0: Current 

1: 30-60 days overdue 
2: 60-90 days overdue 
3: Over 90 days 


Each such account moves from state to state according to a Markov chain 
with transition probability matrix 
0 1 2 3 
0.95 0.05 0 0 
0.50 0 0.50 0 
0.20 0 0 0.80 
0.10 0 0 0.90 


Ww NY —- © 


In the long run, what fraction of accounts are over 90 days overdue? 


Problems 


2.1. Consider a discrete-time periodic review inventory model (see III, 
Section 3.1), and let €, be the total demand in period n. Let X,, be the in- 
ventory quantity on hand at the end of period n. Instead of following an 
(s, S) policy, a (q, Q) policy will be used: If the stock level at the end of 
a period is less than or equal to g = 2 units, then Q = 2 additional units 
will be ordered and will be available at the beginning of the next period. 
Otherwise, no ordering will take place. This is a (g, Q) policy with q = 2 
and Q = 2. Assume that demand that is not filled in a period is lost (no 
back ordering). 


(a) Suppose that X, = 4, and that the period demands turn out to be 
é, = 3, & = 4, & = 0, &, = 2. What are the end-of-period stock lev- 
els for periods n = 1, 2, 3, 4? 
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(b) Suppose that &, &, ...are independent random variables, each 
having the probability distribution where 


k= 0 1 2 3 4 
Pr{é = k} 01 03 O03 02 O11 


Then X,, X;,...iS a Markov chain. Determine the transition probability 
distribution, and the limiting distribution. 


(c) In the long run, during what fraction of periods are orders placed? 


2.2. A system consists of two components operating in parallel: The 
system functions if at least one of the components is operating. In any sin- 
gle period, if both components are operating at the beginning of the pe- 
riod, then each will fail, independently, during the period with probability 
a. When one component has failed, the remaining component fails during 
a period with a higher probability 8. There is a single repair facility, and 
it takes two periods to repair a component. 


(a) Define an appropriate set of states for the system in the manner of 
the Reliability and Redundancy example, and specify the transition 
probabilities in terms of a and PB. 

(b) When a = 0.1 and B = 0.2, 1n the long run what fraction of time 
is the system operating? 


2.3. Suppose that a production process changes state according to a 
Markov process whose transition probability matrix is given by 


0 1 2 3 
0110.2 02 04 02 
—1ffos 02 02 oO 
—2ilo2 03 04 O11 
3110.1 02 04 03 


P 


(a) Determine the limiting distribution for the process. 

(b) Suppose that states 0 and | are “in-control,” while states 2 and 3 are 
deemed “‘out-of-control.”’ In the long run, what fraction of time is 
the process out-of-control? 
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(c) In the long run, what fraction of transitions are from an in-control 
state to an out-of-control state? 


2.4. Acomponent of a computer has an active life, measured in discrete 
units, that is a random variable €, where 


k = 1 2 3 4 
Pr{é = k} 


I 
© 
ay 
© 
1S) 
oO 
we) 
) 
~ 


Suppose that one starts with a fresh component, and each component is re- 
placed by a new component upon failure. Let X, be the remaining life of 
the component in service at the end of period n. When X,, = 0, a new item 
is placed into service at the start of the next period. 


(a) Set up the transition probability matrix for {X,,}. 

(b) By showing that the chain is regular and solving for the limiting 
distribution, determine the long run probability that the item 1n ser- 
vice at the end of a period has no remaining life and therefore will 
be replaced. 

(c) Relate this to the mean life of a component. 


2.5. Suppose that the weather on any day depends on the weather con- 
ditions during the previous two days. We form a Markov chain with the 
following states: 

State (S, S) if it was sunny both today and yesterday, 

State (S, C) if it was sunny yesterday but cloudy today, 

State (C, S) if it was cloudy yesterday but sunny today, 

State (C, C) if it was cloudy both today and yesterday, 


and transition probability matrix 


Today’s State 
(S,S) (SC) (¢S) (CC) 


(S,S)|] 0.7 03 0 0 
_ (S,C)|} 0 0 04 &06 
(C,S)|] 05 0.5 0 0 


(C, C) 0 0 0.2 0.8 
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_(a) Given that it is sunny on days 0 and 1, what is the probability it is 
sunny on day 5? 
(b) In the long run, what fraction of days are sunny? 


9.6. Consider a computer system that fails on a given day with proba- 
bility p and remains “up” with probability gq = 1 — p. Suppose the repair 
time is a random variable N having the probability mass function p(k) = 
BU — p)*"' fork = 1, 2,..., where 0 < B < 1. Let X, = 1 if the com- 
puter is operating on day n and X, = Oif not. Show that {X,} is a Markov 
chain with transition matrix 

3 4 


and a = 1 — B. Determine the long run probability that the computer is 
operating in terms of a, B, p, and q. 


2.7. Customers arrive for service and take their place in a waiting line. 
There is a single service facility, and a customer undergoing service at the 
beginning of a period will complete service and depart at the end of the 
period with probability 8 and will continue service into the next period 
with probability a = 1 — B, and then the process repeats. This description 
implies that the service time 7 of an individual is a random variable with 
the geometric distribution, 


Pr{yn = k} = Bat fork = 1,2,..., 


and the service times of distinct customers are independent random vari- 
ables. 

At most a single customer can arrive during a period. We suppose that 
the actual number of arrivals during the nth period is a random variable &, 
taking on the values 0 or 1 according to 


Pr{g, = 0} = 
and 
Pr{é=1l}=q=1-p forn =0,1,.... 


The state X, of the system at the start of period n is defined to be the 
number of customers in the system, either waiting or being served. Then 
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{X,,} is a Markov chain. Specify the following transition probabilities in 

terms of a, B, p, and q: Py, Po, Poo, Pio, Pi, and P,,. State any additional 

assumptions that you make. 


2.8. An airline reservation system has a single computer, which breaks 
down on any given day with probability p. It takes two days to restore a 
failed computer to normal service. Form a Markov chain by taking as 
states the pairs (x, y), where x is the number of machines in operating con- 
dition at the end of a day and y is | if a day’s labor has been expended on 
a machine, and 0 otherwise. The transition probability matrix 1s 


To State 
+> (1,0) (0,0) (0,1) 
From State 
(1, 0) q Pp 0 
P = (0, 0) 0 0 |}. 
(0, 1) l 0 0 


Compute the system availability 7 for p = 0.01, 0.02, 0.05, and 0.10. 


3. The Classification of States 


Not all Markov chains are regular. We consider some examples. 
The Markov chain whose transition probability matrix is the identity 
matrix 


0 | 
=the al 
P= ilo 1 


remains always in the state in which it starts. Since trivially P” = P for all 
n, the Markov chain X, has a limiting distribution, but it obviously de- 
pends on the initial state. 

The Markov chain whose transition probability matrix is 


0 1 
“il ol 
P= ili o 
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oscillates deterministically between the two states. The Markov chain is 

periodic, and no limiting distribution exists. When n is an odd number, 

then P"” = P, but when n is even, then P” is the 2 X 2 identity matrix. 
When P is the matrix 


0 1 
Ops 3 
Py 0 19 
P” is given by 
0 1 


0/0" 1-0 
P 1i| O 1 WP 


and the limit is 

0 1 
ral? | 
lim P =11o 11 


nax 


Here state O is transient; after the process starts from state 0, there is a 
positive probability that it will never return to that state. 

The three matrices just presented illustrated three distinct types of be- 
havior in addition to the convergence exemplified by a regular Markov 
chain. Various and more elaborate combinations of these behaviors are 
also possible. Some definitions and classifications of states and matrices 
are needed in order to sort out the variety of possibilities. 


3.1. Irreducible Markov Chains 


Since j is said to be accessible from state i if P{) > 0 for some integer 
n = 0; i.e., state j is accessible from state i if there is positive probability 
that state j can be reached starting from state i in some finite number of 
transitions. Two states i and j, each accessible to the other, are said to com- 
municate, and we write i < j. If two states i and j do not communicate, 
then either 


P= 0 for all n = 0 
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_— 
or 


P? =0 ~~ foralln=0 


or both relations are true. The concept of communication is an equiva- 
lence relation: 


(i) ii (reflexivity), a consequence of the definition of 
l i=], 
O-— §. = 
Py 8 lo i # j. 
(ii) If ij, then j © i (symmetry), from the definition of communi- 
cation. 
(iii) Ifi cj andj ok, then i © k (transitivity). 


The proof of transitivity proceeds as follows: i<j and j © k imply that 
there exist integers n and m such that P%? > 0 and P% > 0. Consequently, 
by the nonnegativity of each P“, we conclude that 

Perm = >) PPP = PYP? > 0. 
r=0 
A similar argument shows the existence of an integer v such that P;” > 0, 
as desired. 

We can now partition the totality of states into equivalence classes. The 
states in an equivalence class are those that communicate with each other. 
It may be possible starting in one class to enter some other class with pos- 
itive probability; if so, however, it is clearly not possible to return to the 
initial class, or else the two classes would together form a single class. We 
say that the Markov chain is irreducible if the equivalence relation induces 
only one class. In other words, a process is irreducible if all states com- 
municate with each other. | 

To illustrate this concept, we consider the transition probability matrix 


b $10 
1 210 

p = cece ccces ee cccccccc cence =|0 0.0], 
00:01 0 0 P, 
001503 
00:01 0 
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where P, is an abbreviation for the matrix formed from the initial 
two rows and columns of P, and similarly for P,. This Markov chain 
clearly divides into the two classes composed of states {1, 2} and states 
(3, 4, 5}. 

If the state of X, lies in the first class, then the state of the system there- 
after remains in this class, and for all purposes the relevant transition ma- 
trix is P,. Similarly, if the initial state belongs to the second class, then the 
relevant transition matrix is P,. This is a situation where we have two 
completely unrelated processes labeled together. 

In the random walk model with transition matrix 


states 
1 0 0 QO O|| O 
q 0 p 0 0} 1 
0 0 --> Q O O}] 2 
p= G.I) 
0 q plia- 1 
0 0 O 1 a 
we have the three classes {0}, {1,2,...,a— 1}, and {a}. In this example 


it is possible to reach the first class or third class from the second class, 
but it is not possible to return to the second class from either the first or 
the third class. 


3.2. Periodicity of a Markov Chain 


We define the period of state i, written d(i), to be the greatest common di- 
visor (g.c.d.) of all integers n = 1 for which P‘? > 0. [If P%? = 0 for all 
n = |, define d(i) = 0.] In a random walk (3.1), every transient state 
1,2,...,N— 1 has period 2. If P; > 0 for some single state i, then that 
State now has period 1, since the system can remain in this state any length 
of time. 


In a finite Markov chain of n states with transition matrix 
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———-—___~ 
0100.-: 0 
001 0 
P= 
0 0 

0 0 0 


each state has period n. 
Consider the Markov chain whose transition probability matrix is 


Pp = 


oOo Oo O&O O&O 
Co oO Oooo! = 
oOo - CO WNW 
or Oo OO W 


0 
l 
2 
3 


NO [= 
Ni 


We evaluate P, = 0, P®? = 0, P® = 0, P@ = 4, PS = 0, P = 4. The set 
of integers n = 1 for which PY} > Ois {4, 6, 8. ...}. The period of state 0 
is d(O) = 2, the greatest common divisor of this set. 


Example Suppose that the precipitation in a certain locale depends on 
the season (Wet or Dry) as well as on the precipitation level (High, Low) 
during the preceding season. We model the process as a Markov chain 
whose states are of the form (x, y), where x denotes the season (W = Wet, 
D = Dry) and y denotes the precipitation level (H = High, L= Low). 
Suppose the transition probability matrix is 


(W,H) (WL) (D,H) (D,L) 


(W,H)|| 0 0 O08 02 
p. (WL) |} 0 0 04 06 
(D,H)|| 0.7 03 0 0 
(D,L) 102 08 0 0 


All states are periodic with period d = 2. 
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A situation in which the demand for an inventory item depends on the 
month of the year as well as on the demand during the previous month 
would lead to a Markov chain whose states had period d = 12. 

The random walk on the states 0, +1, *+2,...with probabilities 
Piz; = Pp, Pi-1 = q = 1 — pis periodic with period d = 2. 

We state, without proof, three basic properties of the period of a state: 


1. Ifi oj then d(i) = dQ). 


This assertion shows that the period is a constant in each class of 
communicating states. 


2. If state i has period d(i), then there exists an integer N depending on 
i such that for all integers n = N, 


Porto > 0. 


This asserts that a return to state i can occur at all sufficiently large 
multiples of the period d(i). 


3. If Pi > 0, then Pi" > 0 for all n (a positive integer) sufficiently 
large. 


A Markov chain in which each state has period | is called aperiodic. 
The vast majority of Markov chain processes we deal with are aperiodic. 
Results will be developed for the aperiodic case, and the modified con- 
clusions for the general case will be stated, usually without proof. 


3.3. Recurrent and Transient States 


Consider an arbitrary, but fixed, state i. We define, for each integer n = 1, 
fi? = Pr{X, =i, X, # i, v= 1,2,...,n — 1X = i}. 


In other words, f(” is the probability that starting from state i, the first re- 
turn to state 7 occurs at the nth transition. Clearly f\’ = P,,, and fj may be 
calculated recursively according to 
n 
P= > fPPO, neh, (3.2) 
k=0 

where we define f;”” = 0 for all i. Equation (3.2) is derived by decompos- 
ing the event from which P“” is computed according to the time of the first 


240 IV. The Long Run Behavior of Markov Chains | 


return to state i. Indeed, consider all the possible realizations of the 
process for which X, = i, X,, = i, and the first return to state i occurs at the 
kth transition. Call this event E,. The events E, (k = 1, 2,...,n) are 
clearly mutually exclusive. The probability of the event that the first re- 
turn is at the Ath transition is by definition f™. In the remaining n — k tran- 
sitions, we are dealing only with those realizations for which X, = i. 
Using the Markov property, we have 


Pr{E,} = Pr{first return is at kth transition|X, = i} Pr{xX, = i|X, = i} 


= fpr, lsksn 
(recall that P? = 1). Hence 


Pr{X, = |X, = i} = > Pr{E,} = > fEper® = > FOP Er, 


since by definition f = 0. The verification of (3.2) is now complete. 
When the process starts from state i, the probability that it returns to 
state i at some time is 


20 N 
f= > f? =lim > f?. (3.3) 
n=0 ~* n=0 


We say that a state i is recurrent if f; = 1. This definition says that a state 
i is recurrent if and only if, after the process starts from state i, the proba- 
bility of its returning to state i after some finite length of time is one. A 
nonrecurrent state is said to be transient. 

Consider a transient state 7. Then the probability that a process starting 
from state i returns to state i at least once is f; < 1. Because of the Markov 
property, the probability that the process returns to state i at least twice is 
(f,)’, and repeating the argument, we see that the probability that the 
process returns to i at least k times is (f;)* fork = 1, 2,.... Let M be the 
random variable that counts the number of times that the process returns 
to 7. Then we have shown that M has the geometric distribution in which 


Pr(M=kX, =i} =(f)* fork =1,2,... (3.4) 


and 


i 


E[MX, = i] = (3.5) 


~f 
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Theorem 3.1 establishes a criterion for the recurrence of a state 7 in 
terms of the transition probabilities P,”’. 


Theorem 3.1. A State i is recurrent if and only if 


Equivalently, state i is transient if and only if X7-, Pf? < ©. 


Proof Suppose first that state 7 is transient so that, by definition, f; < 1, 
and let M count the total number of returns to state i. We write M in terms 
of indicator random variables as 


m= > UX, =i), 


n=1 


where 


UX =i =| if X,, = i, 
IMO =lQ gx i. 


Now, equation (3.5) shows that E(M\x, = [|] < © when / is transient. But 
then 


x 


co > E[MIX, = i] = >, ElU(X, = i}|X) =A 


n= 


2 
_— n) 
oo 


n=] 


as claimed. 

Conversely, suppose >,*_, P”” < %, Then M is a random variable whose 
mean is finite, and thus M must be finite. That is, starting from state i, the 
process returns to state i only a finite number of times. Then there must be 
a positive probability that, starting from state i, the process never returns 
to that state. In other words 1 — f; > 0 or f; < 1, as claimed. 


Corollary 3.1 [fi <j and if i is recurrent, then j is recurrent. 
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Proof Since i <j, there exists m,n = 1 such that 


Pw > 0 and py > 0. 


Let vy > 0. We obtain, by the usual argument (see Section 3.1), 
Pinte) > Pe Pl P. and, on summing, 


x x 


vx 
Ss ponent > > Per Po pw = pimp > Po. 
v=0 v=0 v=(0 
Hence if >,7_, P\” diverges, then >". P(” also diverges. O 
This corollary proves that recurrence, like periodicity, is a class prop- 
erty; that is, all states in a equivalence class are either recurrent or 
nonrecurrent. 


Example Consider the one-dimensional random walk on the positive 
and negative integers, where at each transition the particle moves with 
probability p one unit to the right and with probability gq one unit to the 
left (p + gq = 1). Hence 


P2*0=0, n=0,1,2,..., 


and 


5 2n (2n)! 
Py” = ( )pva' = Pd: (3.6) 
n nin: 


We appeal now to Stirling’s formula (see I, (6.10)), 
ni ~ ne "V7. (3.7) 


Applying (3.7) to (3.6), we obtain 
Pen ~ ‘pay _ (4pq)" 
00 . 


m Van 


It is readily verified that p(1 — p) = pg = 4, with equality holding if and 
only if p = g = 3. Hence >"_, Pt? = & if and only if p = 3. Therefore, from 
Theorem 3.1, the one-dimensional random walk is recurrent if and only if 
p = q = 3. Remember that recurrence is a class property. Intuitively, if 
p # q, there is positive probability that a particle initially at the origin will 
drift to + if p > gq (to — if p < q) without ever returning to the origin. 
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Exercises 


3.1. A Markov chain has a transition probability matrix 


01234 567 

Oj; 01000 00 0 
li 00100 00 0 
21 00010 00 0 
3] 00001 00 0 
41050 00005 0 0 
5} 00000 01 0 
6, 00000 00 1 
7 10000 000 
For which integers n = 1, 2,..., 20 1s it true that 

P) > 0? 


What is the period of the Markov chain? 


Hint: One need not compute the actual probabilities. See Section 1.1. 


3.2. Which states are transient and which are recurrent in the Markov 
chain whose transition probability matrix is 


0123 45 
o1/ 0 ! 00: 
ili +: 4 00 0 
20 0 0 0 1 Of, 
34 ¢ 4 0 O | 
410 0 100 0 
si0 00001 
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3.3. A Markov chain on states {0, 1, 2, 3, 4, 5} has transition probabil- 
ity matrices 


,+ 0 ¢ 0 0 0 
0:0: 0 0 
@ |i & 3 9 O OF 
0:0 20 0 
i; 4 00 5 | 
ro} obo4ao4 4 
6 6 6 6 6 6 
100 0 0 0 
0? i 00 0 
0 i :00 0 
(re ° 
1 4 O g 3g O 
10 4 4 40 
000 0 0 1 


Find all communicating classes. 


3.4. Determine the communicating classes and period for each state of 
the Markov chain whose transition probability matrix is 


0 12 3 4 5 
O||; 0 0 0 3; O 
1;0 0 1 0 0 0 
210 0 0 1 0 0 
310 0 0 0 1 O 
470 0 0 0 0 1 
S}O 0 3 3 O 3 
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Problems 
3.1. Atwo state Markov chain has the transition probability matrix 


0 ] 


=9|' 74 a | 
P=, b 1—blW 


(a) Determine the first return distribution 
Soo’ = Pr{X, # 0, see »Xy-| # 0, X,, — O1X, = 0}. 
(b) Verify equation (3.2) when i = 0. (Refer to III, (5.2).) 
3.2. Show that a finite state aperiodic irreducible Markov chain is regu- 
lar and recurrent. 
3.3. Recall the first return distribution (Section 3.3) 
fo = Pr{X, #i,X, #i,...,X,-. #i,X, = |X, = i} forn =1,2,..., 


with f° = 0 by convention. Using equation (3.2), determine fi’, n = 1, 2, 
3, 4, 5, for the Markov chain whose transition probability matrix is 


0 1 2 3 
Oy0 5 O 5 
11;0 0 1 QO 
24,0 0 O 1 
3]; 0 0 3 


4. The Basic Limit Theorem of Markov Chains 


Consider a recurrent state i. Then 
f= Pr{X, =i,X,#iforv=1,...,n—-1]X,=i} (4.1) 
is the probability distribution of the first return time 


R, = min{n = 1; X, = i}. (4.2) 
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This is 

fe = Pr{R, = n|X, =i} forn=1,2,.... (4.3) 
Since state i is recurrent by assumption, then f; = >7*_, fv” = 1, and R, is 


a finite-valued random variable. The mean duration between visits to state 
i 1S 
m, = E[R|Xy = i] = >. nf”. (4.4) 
n=1 
After starting in i, then, on the average, the process is in state i once every 
m,; = E[R|X, = i] units of time. The basic limit theorem of Markov chains 
states this result in a sharpened form. 


Theorem 4.1 The basic limit theorem of Markov chains 
(a) Consider a recurrent irreducible aperiodic Markov chain. Let 
P be the probability of entering state i at the nth transition, n = 0, 1, 


2,..., given that X, = i (the initial state is i). By our earlier convention 
P‘> = |. Let f be the probability of first returning to state i at the nth 
transition, n = 0,1,2,..., where f = 0. Then, 
lim P” (4.5) 
im it, = wn =——, ° 
nes 2n=0 n a mM; 
(b) under the same conditions as in (a), lim,_,.. Pj? = lim,_,. P;” for all 
States j. 


Remark Let C be a recurrent class. Then P{” = 0 fori € C,j € C, and 
every n. Hence, once in C it is not possible to leave C. It follows that the 


ated Markov chain is irreducible and recurrent. The limit theorem, there- 
fore, applies verbatim to any aperiodic recurrent class. 

If lim,_,.. Pi? > 0 for one i in an aperiodic recurrent class, then 7, > 0 
for all j in the class of i. In this case, we call the class positive recurrent 
or strongly ergodic. If each 7, = 0 and the class is recurrent, we speak of 
the class as null recurrent or weakly ergodic. In terms of the first return 
time R; = min{n = 1; X, = i}, state i is positive recurrent if m, = 
E[R|X, = i] < © and null recurrent if m, = ©. This statement is immedi- 
ate from the equality lim,_,,. P’’ = a, = 1/m,. An alternative method for 
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determining the limiting distribution 7; for a positive recurrent aperiodic 
class is given in Theorem 4.2. 


Theorem 4.2 Ina positive recurrent aperiodic class with states j = 0, 
1,2,..->5 


lime” =7m=)> mP;, > m= 1, 
ne i=0 i=0 
and the m’s are uniquely determined by the set of equations 


7, = 0, > m= 1, and 1, = >. WP, forj=0,1,.... (4.6) 


1=0 


i 
os) 


Any set (77;);-9 satisfying (4.6) is called a stationary probability distri- 
bution of the Markov chain. The term “stationary” derives from the prop- 
erty that a Markov chain started according to a stationary distribution will 
follow this distribution at all points of time. Formally, if Pr{X, = i} = 7, 
then Pr{X, = i} = 7, for all n = 1, 2,.... We check this for the case 
n = 1; the general case follows by induction. We write 


ac) 

x 

» 
] 
] 


Pr{X, = k} Pr{X, = i|X, = k} 


Ms Mx 


7, P, — Tj; 


> 
lt 
cS 


where the last equality follows because m7 = (7, 7...) 1S a stationary 
distribution. When the initial state X, is selected according to the station- 
ary distribution, then the joint probability distribution of (X,, X,,.,) 1s given 
by 


Pr{X,, — l, Xn+1 = j} = w P. 


i“ ae 


The reader should supply the proof. 

A limiting distribution, when it exists, is always a stationary distribu- 
tion, but the converse is not true. There may exist a stationary distribution 
but no limiting distribution. For example, there is no limiting distribution 
for the periodic Markov chain whose transition probability matrix is 


0 | 
P=|) ob 
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but a = (5, 4) is a stationary distribution, since 


+ 4) 
3» 3) 


. ola G, ; 


Example Consider the class of random walks whose transition matri- 
ces are given by 


0 41 0 
0 
P = [P| =|] 7 Pi 
q2 0 P2 


This Markov chain has period 2. Nevertheless we investigate the exis- 
tence of a stationary probability distribution; 1.e., we wish to determine 
the positive solutions of 


x; = >» x;B; = Di-Xj-1 T Gia iXia1s i=0,1,..., (4.7) 
jJ=0 


under the normalization 


where p_, = O and p, = 1, and thus x) = q,x,. Using equation (4.7) for 
i = 1, we could determine x, in terms of x). Equation (4.7) for i = 2 de- 
termines x, in terms of xo, and so forth. It is immediately verified that 


_ Pi-i\Pi-2°"" P\ : Pr 


Gidi-1 °° ° 4 k=0 Wk+1 


is a solution of (4.7), with x, still to be determined. Now, since 


i 


“od 0 Fist 


we have 
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and so 


x) > O if and only if Y []  <~, 
i=1 k=0 Wk+1 


q, and then 


converges only when p 


and 


pond 


Xo 


l+lq—-p) l+q-p 2 , 


1/p\ 1 k 
na 42) Ap ZC) tore=1.2,.... 
p\q 2p q/\q 


Example Consider now the Markov chain that represents the success 
runs of binomial trials. The transition probability matrix is 


Po 1- Po 0 0 
a (0 <p, <1). 


Pi 0 0 1 — p, 


The states of this Markov chain all belong to the same equivalence class 
(any state can be reached from any other state). Since recurrence is a class 
property (see Corollary 3.1), we will investigate recurrence for the zeroth 
State. 

Let R, = min{n = 1; X, = 0} be the time of first return to state 0. It is 
easy to evaluate 
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Pr{R, > 1|X, = 0} = (1 — pp), 
Pr{R, > 2|X, = 0} = (1 — p, (1 — p), 
Pr{R, > 3|X) = 0} = (1 — p,)(1 — pC — pr), 


k-1 
Pr{Ry > KX, = 0} = (1 -— pl -—p)---d-p-d =] dp). 
i=0 


In terms of the first return distribution 


foo = Pr{Ry = n|X, = 0}, 


we have 
k 
Pr{R, > kX, = 0} =1— > fe, 
n=1 
or 
k k-1 
>, £8 = 1 — Pr{Ry > KX, = 0} =1-[] Gd - py. 
n=l i=0 
By definition, state 0 is recurrent provided >7_, ff’ = 1. In terms of 


Po. Pi»... then, state 0 is recurrent whenever lim,... Tizo (1 — p) = 
IT, (1 — p,) = 0. Lemma 4.1 shows that I];_, (1 — p;) = 0 is equivalent, 
in this case, to the condition >, p; = ©. 


Lemma 4.1 [f0<p,<1,i=0,1,2,..., thenu,, = izo (1 — p) 3 0 
as m — © if and only if Si. p; = ©. 


Proof Assume >, p; = ©. Since the series expansion for exp(—p,) is 
an alternating series with terms decreasing in absolute value, we can write 


l-p<l—-pt+t——-—+--- =exp(—p), i=0,1,2,....(4.8) 


Since (4.8) holds for all i, we obtain IT?-5 (1 — p,) < exp(—/".y p,). But 
by assumption, 
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hence 


m 


lim > (1 — p) = 0. 
i=0 


mys « 


To prove necessity, observe that from a straightforward induction, 


m 


[|] = p) > - Bp Pi — 0 = Pd 


= 
for any j and all m =j + 1,j + 2,....Assume now that >;_, p; < ©; then 
0< di, p, < 1 for some j > 1. Thus 


tim - p) > tim (1 - > pi) > 0. 
i=j me i=j 


Mme *. 


which contradicts u,, > 0. 
State 0 is recurrent when ITi-> (1 — p, = 0, or equivalently, when 
i-0 D; = ©. The state is positive recurrent when m, = E[R,|X, = 0] < ~, 
But 


my = >. Pr{Ry > kX, = 0} 
k=0 


kal 
=1+) [|] d-p). 
k=1 i=0 
Thus positive recurrence requires the stronger condition that >;_, [Tizo 


(1 — p,) < ©, and in this case, the stationary probability 7, is given by 


l ] 


m0 im 1 + Se WEG — p) 


From the equations for the stationary distribution we have 
(1 — po)% = 7, 


(1 — p\)7, = 7%, 
¢! _ Px) 7 = 173, 
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Or 
m7, = (1 — po)™, 
m7, = (1 — p,m, = (L — pC — po), 
m7, = (1 — p,)m, = (1 — p21 — py) — po), 


and, in general, 
k- 


l 
m= || Q-—p) fork2=1. 
i=0 


In the special case where p; = p = 1 — q fori = 0,1, ..., then 
TTizo (1 ~ 1) = q', 


= 1 
m=1+)> g=-, 
k= Pp 
so that 7, = pq‘ fork = 0,1,.... 


Remark Suppose ap, a,, a,,...iS a convergent sequence of real num- 
bers where a, — a as n — ©, Then it can be proved by elementary meth- 
ods that the partial averages of the sequence also converge in the form 


] *=! 


lim — >» a, = a. (4.9) 


UT Eee Kad n k=0 


Applying (4.9) with a, = P*”, where i is a member of a positive recurrent 
aperiodic class, we obtain 


n—-l | 


1 
lim— > P” = 7,=—>0, (4.10) 


n> . 
' n m=0 i 


where tT = (7%, 77, .. .) 18 the stationary distribution and where m; is the 
mean return time for state 7. Let M‘” be the random variable that counts 
the total number of visits to state i during time periods 0, 1,...,” — 1. 
We may write 


M” = > 1X, = i}, (4.11) 


where 
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w= =f if X, = i, tb 
Hq gx, #i, 6.12) 


and then see that 


n-1 n-1 
E(M|X, = i] = >. E(U{X, = a|X =i) => P®. (4.13) 
k=0 k=0 


Then, referring to (4.10), we have 
. 1 _ I 
lim — E[M|X, = i] —. (4.14) 
n= fl mM; 


In words, the long run (n — ©) mean visits to state i per unit time equals 
7, the probability of state 7 under the stationary distribution. 

Next, let r(i) define a cost or rate to be accumulated upon each visit to 
state i. The total cost accumulated during the first n stages is 


Re- r= Fem =S Ta {X, = i}r(i) 


(4.15) 
= 2, M?”r(i). 


This leads to the following derivation showing that the long run mean cost 
per unit time equals the mean cost evaluated over the stationary distribu- 
tion: 


lim — ~ E[R 1X, = i] = lim > ~ E(M"\X, = i}r(i) 


aux 


(4.16) 
=> ari). 


i=0 


(When the Markov chain has an infinite number of states, then the 
derivation requires that a limit and infinite sum be interchanged. A suf- 
ficient condition to justify this interchange is that r(i) be a bounded func- 
tion of i.) 
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Remark The Periodic Case If iis a member of a recurrent periodic ir- 
reducible Markov chain with period d, one can show that P” = 0 if m is 
not a multiple of d (1.e., if m # nd for any n), and that 


d 
lim P= —. 
nx Mm. 


t 


These last two results are easily combined with (4.9) to show that (4.10) 
also holds in the periodic case. If m; < ©, then the chain is positive recur- 
rent and 


L 
lim — >y Pm = m= (4.17) 


nx 
N m=0 


where 1 = (7, 77,, ...) IS given as the unique nonnegative solution to 


T= > 7Py f=O01,..., 
k=0 


and 

That is, a unique stationary distrbution T = (%, 7, ...) exists for a pos- 
itive recurrent periodic irreducible Markov chain, and the mean fraction 
of time in state i converges to 77; as the number of stages n grows to in- 
finity. 


The convergence of (4.17) does not require the chain to start in state i. 
Under the same conditions, 


holds for all states k = 0, 1, ...as well. 


Exercises 


4.1. Determine the limiting distribution for the Markov chain whose 
transition probability matrix is 
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0 12 3 4 
Ollg p 0 0 0 
liig O p O O 
P=2))¢ 0 O p Oj, 
3ll¢ 0 0 0 p 
41771 0 0 0 O 


where p > 0,qg > 0, andp + q = 1. 


4.2. Consider the Markov chain whose transition probability matrix is 
given by 


0 I 2 3 
0 0 I 0 0 
1}}0.1 04 O02 0.3 


—62tlo2 02 05 O.1 
31103 03 04 0 


(a) Determine the limiting probability 7, that the process is in state 0. 

(b) By pretending that state 0 is absorbing, use a first step analysis (III, 
Section 4) and calculate the mean time m,, for the process to go 
from state | to state 0. 

(c) Because the process always goes directly to state 1 from state 0, the 
mean return time to state 0 is m, = 1 + my. Verify equation (4.5), 
7, = I/map. 


4.3. Determine the stationary distribution for the periodic Markov chain 
whose transition probability matrix is 


O 1 2 3 
oo : Oo § 
1}; 0 3 0 
P=2}}0 4 Oo 3 
314 0 £ 0 
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Problems 


4.1. Consider the Markov chain on {0, 1} whose transition probability 
matrix is 


0 l 
re Qa | 
il pg 1 - pl O0<a,B< 1. 


(a) Verify that (7, 77,) = (B/(a + B), a/(a + B)) is a stationary distri- 
bution. 

(b) Show that the first return distribution to state 0 is given by ff,’ = 
(1 — a) and ff) = aBl — B)"? forn = 2,3,.... 

(c) Calculate the mean return time m, = >;., nf and verify that 
TM = |/m. | 


4.2. Determine the stationary distribution for the Markov chain whose 
transition probability matrix is 


0 1 2 3 
O1//0 0 3 3 
L110 0 3 2 
P _ 3 3 
21/4 2 0 0 
311i 2 0 0 
4.3. Consider a random walk Markov chain on state 0, 1,...,N with 
transition probability matrix 
0 1 2 3 4 5 N-1 N 
0 1 0O 0 O 0 0 
l q Op 0 0 O 0 0 
2 0 gq 0 p 0 0 0 0 
3 0 0 gq O p, O 0 0 ||, 


= 
= | 

— 
So © 
So © 
‘) Z) eee 
So © 
Oo © 
) 
—= © 

S 
Oo Fz 
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where p; + q; = 1, p; > 0, q; > 0 for all i. 
The transition probabilities from state 0 and N “reflect” the process 
back into state 1, 2,..., NM — 1. Determine the limiting distribution. 


4.4. Let {a,:i = 1, 2,...} be a probability distribution, and consider 
the Markov chain whose transition probability matrix is 


kh WN KS OO 
— a 
oor OC. 
oroo;$}. 
a) 
ooo Oo 
ooo oO 


What condition on the probability distribution {a;: i = 1, 2,...} is nec- 
essary and sufficient in order that a limiting distribution exist, and what is 
this limiting distribution? Assume a, > 0 and a, > 0, so that the chain is 
aperiodic. 


4.5. Let P be the transition probability matrix of a finite state regular 
Markov chain. Let M = |m,,| be the matrix of mean return times. 

(a) Use a first step argument to establish that 

m,= 1+ >» P,m,. 
k#j 
(b) Multiply both sides of the preceding by 77; and sum to obtain 
>» Tm, = >» TT, + >» >» 77; Pi,M,;. 
i i k#j i 
Simplify this to show (see equation (4.5)) 
mm, = 1, or 7; = 1/my,. 


4.6. Determine the period of state 0 in the Markov chain whose transi- 
tion probability matrix is 
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3 2 1 0-1 -2 -3 -4 

310 0 0 1 0 0 0 0 
211 0 0 0 0 0 0 0 
iio 1 00 0 0 0 0 
p- 19 0 + 0 4 0 0 0 
-1]0 0 0 0 0 1 0 0 
-210 0 0 0 0 0 1 0 
310 0 0 0 0 0 0 1 
-4lo 0 0 1 0 0 0 0 


4.7. An individual either drives his car or walks in going from his home 
to his office in the morning, and from his office to his home in the after- 
noon. He uses the following strategy: If it is raining in the morning, then 
he drives the car, provided it is at home to be taken. Similarly, if it 1s rain- 
ing in the afternoon and his car is at the office, then he drives the car 
home. He walks on any morning or afternoon that it is not raining or the 
car is not where he is. Assume that, independent of the past, it rains dur- 
ing successive mornings and afternoons with constant probability p. In the 
long run, on what fraction of days does our man walk in the rain? What if 
he owns two cars? 


4.8. A Markov chain on states 0, 1, . . . has transition probabilities 


P,=- forj=O,1,...,47i¢ 1. 


Find the stationary distribution. 


5. Reducible Markov Chains* 


Recall that states i and j communicate if it is possible to reach state j start- 
ing from state 7, and vice versa, and a Markov chain 1s irreducible if all 


* The overwhelming majority of Markov chains encountered in stochastic modeling are 
irreducible. Reducible Markov chains form a specialized topic. 
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pairs of states communicate. In this section we show, mostly by example, 
how to analyze more general Markov chains. 
Consider first the Markov chain whose transition probability matrix is 


which we write in the form 


where 


—_— 
l= Vl 


tole Wt (a) om) 


3 3 O 
_|la 4 0 
00 3 
00 3 
P= p,| 
4 
/ and P.=| 
4 


wt wa fae 


wh Wwitd 


The chain has two communicating classes, the first two states forming one 
class and the last two states forming the other. Then 


Oo © & ve 
©O OC ew wr 


Oo © aolv ow 


and, in general, 


Pp" _ 


wives we CG ©} 


= ooltr 


a 


Oo © 


o 
0 


Oo © 


Olds Olin 


wie Ss i Ss |E} 


win ols CGC & 


Oo OCO B&F ne 


Oo © ew ne 


Pi 


wie wits CD) © 


(5.1) 
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Equation (5.1) is the mathematical expression of the property that it is not 
possible to communicate back and forth between distinct communicating 
classes; once in the first class the process remains there thereafter; and 
similarly, once in the second class, the process remains there. In effect, 
two completely unrelated processes have been labeled together. The tran- 
sition probability matrix P is reducible to the irreducible matrices P, and 
P.. It follows from (5.1) that 


(1) (1) 
cL 1; 0 0 
1) (1) 
175 TT) 0 0 
lim P” = (2) 2) 
im 0 0 a ar |I> 
0 0 ae gr?) 
where 
a) ar!) Te ar?) 
limP” = and limP? = 
t) oe SI a) a!) ? 1) ee nd : Te) ar?) 


We solve for 71 = (27§”, aj?) and m7 = (a75”, m7”) in the usual way: 
La) + bar? = a 
Lng? + ia? = 
m7 + mm = 1, 
or 
T= 3, Th = 3, (5.2) 


and because P, is doubly stochastic (see Section 1.1), it follows that 
ar?) = :, ar) — . 

The basic limit theorem of Markov chains, Theorem 4.1, referred to an 
irreducible Markov chain. The limit theorem applies verbatim to any 
aperiodic recurrent class in a reducible Markov chain. If i, j are in the same 
aperiodic recurrent class, then P;” — 1/m, = 0 as n — . If i, j are in the 
same periodic recurrent class, then n™' >", P” > I/m, = 0 asn > ©. 

If j is a transient state, then P*” — 0 as n > ©, and, more generally, 
P” + 0 as n > © for all initial states i. 


5. Reducible Markov Chains 261 


In order to complete the discussion of the limiting behavior of P*”, we 


still must consider the case where / 1s transient and j 1s recurrent. Consider 
the transition probability matrix 


0 1 2 3 
OO}; 3 0 0 
1}; 3 0 0 
p= ||* “* 
2)a 4 4 4 
3110 0 0 1 


There are three classes: {0, 1}, {2}, and {3}, and of these {0, 1} and {3} are 
recurrent, while {2} is transient. Starting from state 2, the process ulti- 
mately gets absorbed in one of the other classes. The question is, Which 
one? or more precisely, What are the probabilities of absorption in the two 
recurrent classes starting from state 2? 

A first step analysis answers the question. Let u denote the probability 
of absorption in class {0, 1} starting from state 2. Then 1 — u is the prob- 
ability of absorption in class {3}. Conditioning on the first step, we have 


u= GFDL + ut 10) =F + hy 


or u = 3. With probability ? the process enters {0, 1} and remains there 
ever after. The stationary distribution for the recurrent class {0, 1}, com- 
puted in (5.2), is ™ = 3, 7, = 3. Therefore, lim, ,, P% = 3? X 4 = 3, 
lim,,_,.. PS? = 3 X 3 = 4. That is, we multiply the probability of entering the 
class {0, 1} by the appropriate probabilities under the stationary distribu- 
tion for the various states in the class. In matrix form, the limiting behav- 


ior of P" is given by 


1 70 0 
; 3 O O 
mele gO | 
00 0 1 


To firm up the principles, consider one last example: 
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© 
oOo Oo + 


5 
0 
0 


© [vir vim 


Pp" —_ 


Oo OO ]are 
—_—= © re ae 
OS = lane ae 


mn & WO NN —- © 
Oo ©] areK— ve J ve tv 


There are three classes: C, = {0, 1}, C, = {2,3}, and C, = {4, 5}. The sta- 
tionary distribution in C, 1s (7%, 7,), where 


l 
31 + 37, To » 
1 + 2 — 
LI 3 7 ~ TT; 
7 + 7, = 1. 


Then 77, = 2? and 7, = 2. 

Class C; is periodic, and P{” does not converge for 1, j in C, = {4, 5}. 
The time averages do converge, however; and lim,.,,.n7! Yr-o Pi” = 3 for 
i = 3,4 andj) = 3, 4. 

For the transient class C, = {2, 3}, let u; be the probability of ultimate 
absorption in class C, = {0, 1} starting from state i for i = 2, 3. From a 
first step analysis, then 


u, = 3(1) + O(1) + Ou, + fu, + 3(0) + 5(0), 
us = (1) + g(1) + gu, + Ou; + 3(0) + 5(0), 
Or 
U, = 4 + 4; U, = 4+ iu. 


The solution is u, = § and u, = 5. Combining these partial answers in ma- 
trix form, we have 
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0 l 23 4 5 

O}] § 3 00 0 0 

I} 5 > 00 0 0 

219@ GE 0 0 XK X 
limP = 4 ; 

3 (5)G) (5) G) 0 0 XK X 

4); O 0 0 0 XK X 

Sit 0 0 0 0xXkK xX 


where X denotes that the limit does not exist. For the time average, we 
have 


0 I 2 3 4 5 
I 5 3 0 O O 0 
1%! 21@ GE 0 0 GQ GG) 
ne n= © = 3 (75)(8) (75)(3) 0 0 (s 2)(5) (3 (4) |. 
41 0 0 00 $ 
5} 0 OF 00 } 


One possible behavior remains to be illustrated. It can occur only when 
there are an infinite number of states. In this case it is possible that all 
states are transient or null recurrent and lim,,_,.. Pi” = 0 for all states 1, /. 
For example, consider the deterministic Markov chain described by 


X, = X) + n. The transition probability matrix is 


0 1 2 3 

O70 1 0 O 
1]}0 O 1 O 
P=2)]10 0 0 1 
310 0 0 O 
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Then all states are transient, and lim,_,,. P’” = lim,,,.. Pr{X, = I|Xo =j}= 
O for all states i, j. 
If there is only a finite number M of states, then there are no null 
recurrent states, and not all states can be transient. In fact, since 
wo Pi = 1 for all n, it cannot happen that lim,_,.. P° = 0 for all j. 


Exercises 


5.1. Given the transition matrix 


0123 4 
Ol! 3 000 
iff: 4 00 0 

P=2\10 0 1 0 Of, 
3llo 0 4! 2 0 
4111 00 0 0 


determine the limits, as n > ©, of PS” fori =0,1,..., 4. 


5.2. Given the transition matrix 


123 4 5 6 7 

Ij; % 00 0 0 0 

2; 2 00 0 0 0 

30 00 %; 3; O O 

P=41/0 01 0 0 0 Off, 

510 0 10 0 0 0 

66 9 6 6 O 4 

710 0 0 0 0 0 1 

derive the following limits, where they exist: 

(a) lim,,_,.. P,”” (e) lim,_,. Ps” 
(b) lim,_.. Ps (f) lim,,.. Px’ 
(c) lim,,_,.. Pe” (g) lim,_,.. Pé 


(d) lim,,_.. Péy (h) lim,.,.. Ps? 


Problems 


Problems 
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5.1. Describe the limiting behavior of the Markov chain whose transi- 
tion probability matrix is 


Hint: 


and 


0 1 2 3 4 55 


First consider the matrices 


O 1 2 3-4 

0 70.1 O1 02 03 

0 01 O01 0.1 

P,= 2 106 0 O 0.2 


3-41) 0 O 0 
5-7 0 O 0 


3 4 J 
3 | 0.3 0.7 | 0.3 
P; = ) C = 
4110.7 0.3 0.1 
0.8 


0170.1 O11 0.2 O02 O01 0.1 
li} O O01 O11 01 O 03 
21106 0 O 01 O01 O.1 
31, O O O 03 07 #40 
44 0 O O 07 03 #40 
3, 0 0 OO O O 03 
6, O O O O O O01 
7H O O O O O 08 


6 7 
0.1 0.1 
0.2 0.2 
0.1 0.0 

0 O 

0 O 
04 0.3 

0 0.9 
0.2 O 

5-7 

0.3 

0.7 

0.2 

0 
Nt 

6 7 
0.4 0.3 

0 0.9 
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5.2. Determine the limiting behavior of the Markov chain whose transi- 
tion probability matrix is 


0 1 2 3 4 5 6 7 


O01 02 O1 O1 02 O1 O1 O11 
il} 0 O01 02 O01 0 03 01 02 
2105 0 0 02 O1 01 01 0 
31 0 0 03 07 0 0 0 0 

P=4ll 0 00604 0 0 0 0 
Ss} 0 0 0 0 0 03 04 03 
6| 0 0 0 0 0 02 02 06 
10 0 0 0 0 09 01 0 


Chapter V 
Poisson Processes 


1. The Poisson Distribution and 
the Poisson Process 


Poisson behavior is so pervasive in natural phenomena and the Poisson 
distribution 1s so amenable to extensive and elaborate analysis as to make 
the Poisson process a cornerstone of stochastic modeling. 


1.1. The Poisson Distribution 


The Poisson distribution with parameter pz > 0 is given by 


—p,,k 
a= fork =0,1,.... (1.1) 


Let X be a random variable having the Poisson distribution in (1.1). We 
evaluate the mean, or first moment, via 
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To evaluate the variance, it is easier first to determine 


E[X(X — 1)] = >. k(k — Up, 


k=2 
7 “ pe?) 
= 2e B 
Wet 2 GD 
= p. 
Then 
E[X*] = E[X(X — 1)] + E[X] 
=p + pb, 
while 


ox = Var[X] = E[X*] — {E[X]¥ 
= e+ he B= ph. 


Thus the Poisson distribution has the unusual characteristic that both the 
mean and the variance are given by the same value jp. 

Two fundamental properties of the Poisson distribution, which will 
arise later in a variety of forms, concern the sum of independent Poisson 
random variables and certain random decompositions of Poisson phe- 
nomena. We state these properties formally as Theorems 1.1 and 1.2. 


Theorem 1.1. Let X and Y be independent random variables having 
Poisson distributions with parameters js and v, respectively. Then the sum 
X + Yhas a Poisson distribution with parameter + v. 


Proof By the law of total probability, 


Pr{X+ Y=n}=)> Pr{X =k, Y=n-k} 
k=0 


= >. Pr{X = k} Pr{¥ =n — k} 
k=0 
(X and Y are independent) 
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_ S feral p" ker’ 
k=0 k! (n —_ k)! 
e (Ht) n n! 


ni! =o ki(n — k)! 


(1.2) 


_ pk pr*. 


The binomial expansion of (wu + v)" is, of course, 


nN n! 


+ V a —_—_ kynnk 
w+) 2, kin — bt 
and so (1.2) simplifies to 

eT HF + y)" 


Pr{xX + Y=n} = ; n=0,1,..., 


n! 


the desired Poisson distribution. 
To describe the second result, we consider first a Poisson random vari- 
able N where the parameter is 4. > 0. Write N as a sum of ones in the form 


N=1+1+-:: 41, 
Cc 
N ones 


and next, considering each one separately and independently, erase it with 
probability 1 — p and keep it with probability p. What is the distribution 
of the resulting sum M, of the form M=1+0+0+1+-:-+1? 

The next theorem states and answers the question in a more precise 
wording. 


Theorem 1.2 Let N be a Poisson random variable with parameter p, 
and conditional on N, let M have a binomial distribution with parameters 
N and p. Then the unconditional distribution of M is Poisson with para- 
meter jp. 


Proof The verification proceeds via a direct application of the law of 
total probability. Then 
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Pr{M = k} = 3 Pr{M = kN = n} Pr{N = n} 
n=O 


= no k — n—-k ies 
> tae piel ~ P) I 7 


ee >) {wd = p)y"* 


n=k (n _ k)! 
_ &*(upy oul 
kl 
_ e *(p)* _ 
= fork =0,1,..., 


which is the claimed Poisson distribution. 


1.2. The Poisson Process 


The Poisson process entails notions of both independence and the Poisson 
distribution. 


Definition <A Poisson process of intensity, or rate, \ > 0 is an integer- 
valued stochastic process { X(t); t = 0} for which 


(i) for any time points t, =O <t, <t,<-+: <t,, the process 
increments 


X(t,) — X(t), X(t.) — XQ), .--, X(t,) — XG-1) 
are independent random variables; 


(ii) for s 2 0 and t > O, the random variable X(s + t) — X(s):has the 
Poisson distribution 
At kKp~M 
Pr{X(s +) - X(s) = hj — fork =0,1,...; 
(iii) X(O) = 0 
In particular, observe that if X(t) is a Poisson process of rate A > 0, then 
the moments are 


E[X()] = At and Var[X(1)] = 03.) = At. 
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Example Defects occur along an undersea cable according to a Poisson 
process of rate A = 0.1 per mile. (a) What is the probability that no de- 
fects appear in the first two miles of cable? (b) Given that there are no 
defects in the first two miles of cable, what is the conditional probability 
of no defects between mile points two and three? To answer (a) we ob- 
serve that X(2) has a Poisson distribution whose parameter is (0.1)(2) = 
0.2. Thus Pr{X(2) = 0} = e~°? = 0.8187. In part (b) we use the indepen- 
dence of X(3) — X(2) and X(2) — X(0) = X(2). Thus the conditional prob- 
ability is the same as the unconditional probability, and 


Pr{X(3) — X(2) = 0} = Pr{X(1) = 0} = e®' = 0.9048. 


Example Customers arrive in a certain store according to a Poisson 
process of rate A = 4 per hour. Given that the store opens at 9:00 A.M., 
what is the probability that exactly one customer has arrived by 9:30 and 
a total of five have arrived by 11:30 A.M.? 

Measuring time ¢ in hours from 9:00 A.M., we are asked to determine 
Pr{X(5) = 1, X@) = 5}. We use the independence of X(2) — X(;) and XG) 
to reformulate the question thus: 


Pr{X() = 1, XQ) = 5} = Pr{X@) = 1, XG) — X@ = 4} 


_ e *DA(3) e421 4(2)]* 
-| 1! I 4! 


= (2e7?)(SHe-%) = 0.0154965. 


1.3. Nonhomogeneous Processes 


The rate A in a Poisson process X(t) is the proportionality constant in the 
probability of an event occurring during an arbitrarily small interval. To 
explain this more precisely, 


Pr{X(t + h) — X(t) = 1} = — | 


= (Ah)\(1 — Ah + 3X — -::) 
= Ah + o(h), 
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where o(h) denotes a general and unspecified remainder term of smaller 
order than h. 

It is pertinent in many applications to consider rates A = A(t) that vary 
with time. Such a process is termed a nonhomogeneous or nonstationary 
Poisson process to distinguish it from the stationary, or homogeneous, 
process that we primarily consider. If X(t) 1s a nonhomogeneous Poisson 
process with rate A(t), then an increment X(t) — X(s), giving the number 
of events in an interval (s, t], has a Poisson distribution with parameter 
f! A(u) du, and increments over disjoint intervals are independent random 
variables. 


Example Demands on a first aid facility in a certain location occur ac- 
cording to a nonhomogeneous Poisson process having the rate function 

2t forO0 =t< 1, 

A(t) = 42 for] St< 2, 

4-t for2S¢=4, 
where ¢ is measured in hours from the opening time of the facility. What is 
the probability that two demands occur in the first two hours of operation 
and two in the second two hours? Since demands during disjoint intervals 
are independent random variables, we can answer the two questions sepa- 
rately. The mean for the first two hours is w = f} 2t dt + Jf? 2dt = 3, and 
thus 
e73(3) 

2! 
For the second two hours, x = {3 (4 — t) dt = 2, and 

e(2) 
2! 

Let X(t) be a nonhomogeneous Poisson process of rate A(t) > O and de- 
fine A(t) = Jj A(u) du. Make a deterministic change in the time scale and 
define a new process Y(s) = X(t), where s = A(t). Observe that As = 
A(t)At + o(At). Then 

Pr{Y(s + As) — Y(s) = 1} = Pr{X(t + Ad) — X() = 1} 
= A(t)At + o(At) 


= As + o(As), 


Pr{X(2) = 2} = = 0.2240. 


Pr{X(4) — X(2) = 2} = 


= 0.2707. 
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so that Y(s) is a homogeneous Poisson process of unit rate. By this means, 
questions about nonhomogeneous Poisson processes can be transformed 
into corresponding questions about homogeneous processes. For this rea- 
son we concentrate our exposition on the latter. 


1.4 Cox Processes 


Suppose that X(t) is a nonhomogeneous Poisson process, but where the 
rate function {A(t), t = 0} is itself a stochastic process. Such processes 
were introduced in 1955 as models for fibrous threads by Sir David Cox, 
who called them doubly stochastic Poisson processes. Now they are most 
often referred to as Cox processes in honor of their discoverer. Since their 
introduction, Cox processes have been used to model a myriad of phe- 
nomena, e.g., bursts of rainfall, where the likelihood of rain may vary with 
the season; inputs to a queueing system, where the rate of input varies 
over time, depending on changing and unmeasured factors; and defects 
along a fiber, where the rate and type of defect may change due to varia- 
tions in material or manufacture. As these applications suggest, the 
process increments over disjoint intervals are, in general, statistically de- 
pendent in a Cox process, as contrasted with their postulated indepen- 
dence in a Poisson process. 

Let {X(t); t = 0} be a Poisson process of constant rate A = 1. The very 
simplest Cox process, sometimes called a mixed Poisson process, in- 
volves choosing a single random variable ©, and then observing the 
process X’(t) = X(Or). Given 9, then X’ is, conditionally, a Poisson 
process of constant rate A = ©, but © is random, and typically, unob- 
servable. If © is a continuous random variable with probability density 
function f(6), then, upon removing the condition via the law of total prob- 
ability, we obtain the marginal distribution 


Pr{X'(t) = k} = 


[& ore ne (6) dé. (1.3) 


Problem 1.12 calls for carrying out the integration in (1.3) in the particu- 
lar instance in which © has an exponential density. 

VI, Section 7 develops a model for defects along a fiber in which a 
Markov chain in continuous time is the random intensity function for a 
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Poisson process. A variety of functionals are evaluated for the resulting 
Cox process. 


Exercises 


1.1. Defects occur along the length of a filament at a rate of A = 2 per 
foot. 


(a) Calculate the probability that there are no defects in the first foot of 
the filament. 

(b) Calculate the conditional probability that there are no defects in the 
second foot of the filament, given that the first foot contained a sin- 
gle defect. 


1.2. Let p, = Pr{X = k} be the probability mass function corresponding 
to a Poisson distribution with parameter A. Verify that pp) = exp{—A}, and 
that p, may be computed recursively by p, = (A/k)p,_.. 


1.3 Let X and Y be independent Poisson distributed random variables 
with parameters a@ and B, respectively. Determine the conditional distrib- 
ution of X, given thatN = X + Y= n. 


1.4 Customers arrive at a service facility according to a Poisson process 
of rate A customer/hour. Let X(t) be the number of customers that have ar- 
rived up to time t. 


(a) What is Pr{ X(t) = k} fork =0,1,...? 
(b) Consider fixed times 0 < s < t. Determine the conditional proba- 
bility Pr{X(t) = n + k|X(s) = n} and the expected value E[X(1)X(s)]. 


1.5. Suppose that a random variable X is distributed according to a Pois- 
son distribution with parameter A. The parameter A is itself a random vari- 
able, exponentially distributed with density f(x) = 6e~™ for x = 0. Find 
the probability mass function for X. 


1.6. Messages arrive at a telegraph office as a Poisson process with 
mean rate of 3 messages per hour. 


(a) What is the probability that no messages arrive during the morning 
hours 8:00 A.M. to noon? 
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(b) What is the distribution of the time at which the first afternoon mes- 
sage arrives? 


1.7. Suppose that customers arrive at a facility according to a Poisson 
process having rate A = 2. Let X(t) be the number of customers that have 
arrived up to time t. Determine the following probabilities and conditional 
probabilities: 


(a) Pr{X(1) = 2}. 

(b) Pr{X(1) = 2 and X(3) = 6}. 
(c) Pr{X(1) = 2)X(3) = 6}. 

(d) Pr{X(3) = 6)X(1) = 2}. 


1.8. Let {X(t); t = 0} be a Poisson process having rate parameter A = 2. 
Determine the numerical values to two decimal places for the following 
probabilities: 

(a) Pr{X(1) S 2}. 


(b) Pr{X(1) = 1 and X(2) = 3}. 
(c) Pr{X(1) = 2|X(1) = 1}. 


1.9. Let {X(4); t = 0} be a Poisson process having rate parameter A = 2. 
Determine the following expectations: 


(a) E[X(2)]. 


(b) El{x(1)}’1. 
(c) E[X(1) X(2)). 


Problems 


1.1. Let &, &,... be independent random variables, each having an ex- 
ponential distribution with parameter A. Define a new random variable 
X as follows: If €, > 1, then X = 0; if & = 1 but €& + & > 1, then set 
X = 1; in general, set X = kif 


Gtr th Slog te ta + Sasi. 


Show that X has a Poisson distribution with parameter A. (Thus the 
method outlined can be used to simulate a Poisson distribution.) 
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Hint: € + .--: + &, has a gamma density 
AExko! . 
T(x) = k- DI for x > 0. 


Condition on é, + --: + &, and use the law of total probability to show 
! 
Pr(X = k} = | [1 — FU ~ 2) dt, 
0 
where F(x) is the exponential distribution function. 


1.2. Suppose that minor defects are distributed over the length of a 
cable as a Poisson process with rate a, and that, independently, major de- 
fects are distributed over the cable according to a Poisson process of rate 
B. Let X(t) be the number of defects, either major or minor, in the cable 
up to length ¢. Argue that X(t) must be a Poisson process of rate a + B. 


1.3. The generating function of a probability mass function p, = 
Pr{X = k}, fork = 0,1,..., 1s defined by 


g,(s) = E[s*] = > p,s* for s| < 1. 
k=0 
Show that the generating function for a Poisson random variable X with 
mean p is given by 
gx(s) = eet, 
1.4. (Continuation) Let X and Y be independent random variables, 


Poisson distributed with parameters q@ and f, respectively. Show that the 
generating function of their sum N = X + Y is given by 


gy(S) — e7 (at BN 5), 


Hint: Verify and use the fact that the generating function of a sum of 
independent random variables is the product of their respective gener- 
ating functions. See III, 9.2. 


1.5. For each value of h > 0, let X(h) have a Poisson distribution with 
parameter Ah. Let p,(h) = Pr{X(h) = k} fork = 0, 1,.... Verify that 
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h 
lim pe = X, or po(h) = 1 — Ah + o(h); 
h-0 
h 
i a =),  orph) = Ah + o(h): 
10 
(h 
lim a = Q, or p,(h) = o(h). 


Here o(h) stands for any remainder term of order less than h as h > 0. 


1.6. Let {X(‘); t = 0} be a Poisson process of rate A. For s, t > 0, deter- 
mine the conditional distribution of X(t), given that X(t + s) =n. 


1.7 Shocks occur to a system according to a Poisson process of rate A. 
Suppose that the system survives each shock with probability a, indepen- 
dently of other shocks, so that its probability of surviving k shocks is a’. 
What is the probability that the system is surviving at time f? 


1.8. Find the probability Pr{X(t) = 1, 3, 5, . . .} that a Poisson process 
having rate A is odd. 


1.9. Arrivals of passengers at a bus stop form a Poisson process X(t) 
with rate A = 2 per unit time. Assume that a bus departed at time t = 0 
leaving no customers behind. Let T denote the arrival time of the next bus. 
Then the number of passengers present when it arrives is X(T). Suppose 
that the bus arrival time T is independent of the Poisson process and that 
T has the uniform probability density function 


_ F forO =rt=1, 
fr = 0 ~- elsewhere. 
(a) Determine the conditional moments E[X(T)IT = ft] and 


E({X(1)}|T = ¢]. 
(b) Determine the mean E[X(T)] and variance Var[X(T)]. 


1.10. Customers arrive at a facility at random according to a Poisson 
process of rate A. There is a waiting time cost of c per customer per unit 
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time. The customers gather at the facility and are processed or dispatched 
in groups at fixed times 7, 2T, 37, .... There is a dispatch cost of K. The 
process is depicted in the following graph. 


Number of Customers Waiting 


0 T 2T 
Time —> 


Figure 1.1. The number of customers in a dispatching system as a 
function of time. 


(a) What is the total dispatch cost during the first cycle from time 0 to 
time T? 

(b) What is the mean total customer waiting cost during the first cycle? 

(c) What is the mean total customer waiting + dispatch cost per unit 
time during the first cycle? 

(d) What value of T minimizes this mean cost per unit time? 


1.11. Assume that a device fails when a cumulative effect of k shocks 
occurs. If the shocks happen according to a Poisson process of parameter 
A, what is the density function for the life T of the device? 


1.12. Consider the mixed Poisson process of Section 1.4, and suppose 
that the mixing parameter © has the exponential density f(@0) = e~° for 
6> 0. 


(a) Show that equation (1.3) becomes 


t V/ | 
Pr{X’(t) = j} = (—_)(—_}, for 7 =0,1,.... 
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(b) Show that 


k ( + ‘ | ( 1] i 
MQ=f7,X(ttss=frky=(. jt’s}——_] , 
Pr{X'() = J, X(t + 5) =] } j s Taedy 

so that X’(r) and the increment X'(t + s) — X’(f) are not independent ran- 
dom variables, in contrast to the simple Poisson process as defined in Sec- 
tion 1.2. 


2. The Law of Rare Events 


The common occurrence of the Poisson distribution in nature is explained 
by the law of rare events. Informally, this law asserts that where a certain 
event may occur in any of a large number of possibilities, but where the 
probability that the event does occur in any given possibility is small, then 
the total number of events that do happen should follow, approximately, 
the Poisson distribution. 

A more formal statement in a particular instance follows. Consider a 
large number N of independent Bernoulli trials where the probability p of 
success on each trial is small and constant from trial to trial. Let X,., de- 
note the total number of successes in the N trials, where X,,, follows the 
binomial distribution 


~ RUN — Bb! 


Now let us consider the limiting case in which N > & and p > 0 in 
such a way that Np = yx > O where pis constant. It is a familiar fact (see 
I, Section 3) that the distribution for X,.,, becomes, in the limit, the Pois- 
son distribution 


Pr{Xy, = k} pi — p)** fork =0,...,N. (2.1) 


7 fork =0,1,.... (2.2) 

This form of the law of rare events is stated as a limit. In stochastic 
modeling, the law is used to suggest circumstances under which one 
might expect the Poisson distribution to prevail, at least approximately. 
For example, a large number of cars may pass through a given stretch of 
highway on any particular day. The probability that any specified car is in 
an accident is, we hope, small. Therefore, one might expect that the actual 
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number of accidents on a given day along that stretch of highway would 
be, at least approximately, Poisson distributed. 

While we have formally stated this form of the law of rare events as a 
mathematical limit, in older texts, (2.2) is often called “the Poisson ap- 
proximation” to the binomial, the idea being that when N is large and p is 
small, the binomial probability (2.1) may be approximately evaluated by 
the Poisson probability (2.2) with 4 = Np. With today’s computing 
power, exact binomial probabilities are not difficult to obtain, so there is 
little need to approximate them with Poisson probabilities. Such is not the 
“case if the problem is altered slightly by allowing the probability of 
success to vary from trial to trial. To examine this proposal in detail, let 
E,, €, ... be independent Bernoulli random variables, where 


Prfe = 1} =p, and Prf{e,=0} = 1 —-p,, 


and let S, = €, + --: + €,. When p, = p, = --: = p, then S, has a bino- 
mial distribution, and the probability Pr{S, = k} for some k = 0, 1,... is 
easily computed. It is not so easily evaluated when the p’s are unequal, the 
binomial formula generalizing to 


Pr{S, = k} = 2 || pra — py, (2.3) 
i=} 


where >“ denotes the sum over all 0, 1 valued x,’s such that x, + --: + 
x, = k. 

Fortunately, Poisson approximation may still prove accurate and allow 
the computational challenges presented by equation (2.3) to be avoided. 


Theorem 2.1 Let €, €&,...be independent Bernoulli random vari- 
ables, where 


Prfe- = 1} =p; and Pr{eé = 0} =1—p, 


and let S, = €, + ++: + €,. The exact probabilities for S,, determined using 
(2.3), and Poisson probabilities with w = p, + ++: + p, differ by at most 
pie" 


Pr{S, = k} - 7 


<)> pi. (2.4) 
i=] 


Not only does the inequality of Theorem 2.1 extend the law of rare events 
to the case of unequal p’s, it also directly confronts the approximation 
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issue by providing a numerical measure of the approximation error. Thus 
the Poisson distribution provides a good approximation to the exact prob- 
abilities whenever the p,’s are uniformly small as measured by the right 
side of (2.4). For instance, when p, = p, = °°: = p/n, then the right side 
of (2.4) reduces to 2’/n, which is small when n is large, and thus (2.4) pro- 
vides another means of obtaining the Poisson distribution (2.2) as a limit 
of the binomial probabilities (2.1). 

We defer the proof of Theorem 2.1 to the end of this section, choosing 
to concentrate now on its implications. As an immediate consequence, for 
example, in the context of the earlier car accident vignette, we see that the 
individual cars need not all have the same accident probabilities in order 
for the Poisson approximation to apply. 


2.1 The Law of Rare Events and the Poisson Process 


Consider events occurring along the positive axis [0, ©) in the manner 
shown in Figure 2.1. Concrete examples of such processes are the time 
points of the x-ray emissions of a substance undergoing radioactive decay, 
the instances of telephone calls originating in a given locality, the occur- 
rence of accidents at a certain intersection, the location of faults or defects 
along the length of a fiber or filament, and the successive arrival times of 
customers for service. 

Let N((a, b]) denote the number of events that occur during the interval 
(a, b]. That is, if t; < t, < t, < --: denote the times (or locations, etc.) of 


N((a, b)}) = 3 


t, t, 


a b t 


Figure 2.1 A Poisson point process. 
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ee 


successive events, then N((a, b]) is the number of values ft, for which 
a<t,Sb. 
We make the following postulates: 


(1) The numbers of events happening in disjoint intervals are indepen- 


(2 


) 


dent random variables. That is, for every integer m = 2, 3,... and 
time points fp = 0 <t,<t,< +--+ <+t,, the random variables 


N((to, t,]), Nt), t,)), ss 89 NC ty 15 t,1) 


are independent. 

For any time ¢ and positive number h, the probability distribution 
of N((t, t + h]), the number of events occurring between time t and 
t + h, depends only on the interval length h and not on the time t. 


(3) There is a positive constant A for which the probability of at least 


one event happening in a time interval of length h is 
Pr{N(t,t + h]})=1}=Ah+o0(h) ashLo. 


(Conforming to a common notation, here o(h) as h J 0 stands for a 
general and unspecified remainder term for which o(h)/h — 0 as 
h J 0. That is, a remainder term of smaller order than h as h vanishes.) 


(4) The probabilty of two or more events occurring in an interval of 


length h is o(h), or 


Pr{N(t, t + h]) =2}=o(h), ho. 


Postulate 3 is a specific formulation of the notion that events are rare. 
Postulate 4 is tantamount to excluding the possibility of the simultaneous 
occurrence of two or more events. In the presence of Postulates | and 2, 
Postulates 3 and 4 are equivalent to the apparently weaker assumption that 
events occur singly and discretely, with only a finite number in any finite 
interval. In the concrete illustrations cited earlier, this requirement is usu- 
ally satisfied. 

Disjoint intervals are independent by 1, and 2 asserts that the distribu- 
tion of N((s, t]) is the same as that of N((O, t — s]). Therefore, to describe 
the probability law of the system, it suffices to determine the probability 
distribution of N((0, t]) for an arbitrary value of ¢. Let 


P(t) = Pr{N((O, t]) = &}. 
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We will show that Postulates 1 through 4 require that P,(t) be the Poisson 
distribution 
(At)‘e~™ 
k! 
To establish (2.5), divide the interval (0, ¢] into n subintervals of equal 
length h = t/n, and let 


P(t) = fork =0,1,.... (2.5) 


if there is at least one event in the interval ((i — 1)¢/n, it/n], 
“~ lo otherwise. 


Then S, = €, + -:: + €, counts the total number of subintervals that con- 
tain at least one event, and 


p; = Pr{e, = 1} = At/n + o(t/n) (2.6) 
according to Postulate 3. Upon applying (2.4), we see that 
pie" 
Pr{S, = k} — 7 | = nfAt/n + o(t/in)/? 
ty t t \’ 
= (An) + 2ato(=| + no(=| ; 
n n n 
where 
w= >. p= At + noltin). (2.7) 
i=] 


Because o(h) = o(t/n) is a term of order smaller than h = t/n for large n, 
it follows that 
o(t/n) o(h) 
th) = t—— = t——_ 
notin) = tT h 
vanishes for arbitrarily large n. Passing to the limit as n — ™, then, we de- 
duce that 


koa bh 
lim Pr{S, = k} = 


mn with pp = At. 


To complete the demonstration, we need only show that 


lim Pr{S, = k} = Pr{N((0, t]) = k} = P.(. 


noe 
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But S,, differs from N((0, ¢]) only if at least one of the subintervals con- 
tains two or more events, and Postulate 4 precludes this because 


IP,(t) — Pr{S, = k}| = Pr{N((O, 1) # S,} 


<S nfl Hes 


=no(t/n) (by Postulate 4) 
— 0 asn > ©, 


By taking n arbitrarily large, or equivalently, by dividing the interval 
(0, ¢] into arbitrarily small subintervals, we see that it must be the case that 
At k,-A 
Pr{N((0, ¢]) = k} = —_ fork = 0, 
and Postulates 1—4 imply the Poisson distribution. 

Postulates 1 through 4 arise as physically plausible assumptions in 
many circumstances of stochastic modeling. The postulates seem rather 
weak. Surprisingly, they are sufficiently strong to force the Poisson be- 
havior just derived. This motivates the following definition. 


Definition Let N((s, t]) be a random variable counting the number of 
events occurring in an interval (s, t]. Then N((s, t]) is a Poisson point 
process of intensity A > 0 if 


(i) for every m = 2, 3,...and distinct time points} =Q<t,<t< 
-+ <t,, the random variables 


N((tos t))s Mis fo), Mn- > tnd) 


are independent; and 
(11) for any times s < t the random variable N((s, t]) has the Poisson 
distribution 
r t—- ke- Atos) 
Pr{N(s, t]) = k} = —— k=0,1,.... 
Poisson point processes often arise in a form where the time parameter 
is replaced by a suitable spatial parameter. The following formal example 
illustrates this vein of ideas. Consider an array of points distributed in a 
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space E (E is a Euclidean space of dimension d = 1). Let M(A) denote the 
number of points (finite or infinite) contained in the region A of E. We pos- 
tulate that N(A) is a random variable. The collection {N(A)} of random 
variables, where A varies over all possible subsets of E, is said to be a ho- 
mogeneous Poisson process if the following assumptions are fulfilled: 


(i) The numbers of points in nonoverlapping regions are independent 
random variables. 

(ii) For any region A of finite volume, M(A) is Poisson distributed with 
mean AIA , where A is the volume of A. The parameter A is fixed 
and measures in a sense the intensity component of the distribution, 
which is independent of the size or shape. Spatial Poisson 
processes arise in considering such phenomena as the distribution 
of stars or galaxies in space, the spatial distribution of plants and 
animals, and the spatial distribution of bacteria on a slide. These 
ideas and concepts will be further studied in Section 5. 


2.2 Proof of Theorem 2.1 


First, some notation. Let e(p) denote a Bernoulli random variable with 
success probability p, and let X(@) be a Poisson distributed random vari- 
able with parameter 06. We are given probabilities p,,..., p, and let 
fe = p, + +++ + p,. With e(p,), ..., €(p,) assumed to be independent, we 
have S, = €(p,) + -°: + e(p,), and according to Theorem 1.1, we may 
write X(jz) as the sum of independent Poisson distributed random vari- 
ables in the form X(yz) = X(p,) + ++: + X(p,). We are asked to compare 
Pr{S, = k} with Pr{X(y) = k}, and, as a first step, we observe that if S, 
and X(2) are unequal, then at least one of the pairs e(p,) and X(p,) must 
differ, whence 


Pr{S, = k} — Pr{X(u) = k}| = >> Prie(p,) # X(p)}. (2.8) 
k=1 


As the second step, observe that the quantities that are compared on the 
left of (2.8) are the marginal distributions of S, and X(), while the bound 
on the right is a joint probability. This leaves us free to choose the joint 
distribution that makes our task the easiest. That is, we are free to specify 
the joint distribution of each e(p,) and X(p,), as we please, provided only 
that the marginal distributions are Bernoulli and Poisson, respectively. 
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To complete the proof, we need to show that Pr{e(p) # X(p)} S p’ for 
some Bernoulli random variable €(p) and Poisson random variable X(p), 
since this reduces the right side of (2.8) to that of (2.4). Equivalently, we 
want to show that 1 — p’ = Pr{e(p) = X(p)} = Pr{e(p) = X(p) = 0} + 
Pr{e(p) = X(p) = 1}, and we are free to choose the joint distribution, pro- 
vided that the marginal distributions are correct. 

Let U be a random variable that is uniformly distributed over the inter- 
val (0, 1]. Define 


l fO<Usp, 
€(p) = . 
0 ifp<Us=l, 
and fork = 0, 1,..., set 
k-] ip—p k et 
X(p) = k_ when y= <Usy , 
i=O F: i=0 ¢: 


It is elementary to verify that e(p) and X(p) have the correct marginal dis- 
tributions. Furthermore, because 1 — p S e’, we have e(p) = X(p) = 0 
only for U = 1 — p, whence Pr{e(p) = X(p) = 0} = 1 — p. Similarly, 
€(p) = X(p) = 1 only when e’? < U S (1 + pi)e™’, whence Pr{e(p) = 
X(p) = 1} = pe’. Upon summing these two evaluations, we obtain 


Pr{e(p) = X(p)} =1—pt+pe’=1—p't+pl2---21-p’ 


as was to be shown. This completes the proof of (2.4). 
Problem 2.10 calls for the reader to review the proof and to discover the 
single line that needs to be changed in order to establish the stronger result 


Pr(S, in 1} — Pr(X(q) in T}| < . p? 
k=1 


for any set of nonnegative integers J. 


Exercises 


2.1. Determine numerical values to three decimal places for Pr{X = k}, 
k = 0, 1, 2, when 


(a) X has a binomial distribution with parameters n = 20 and p = 0.06. 
(b) X has a binomial distribution with parameters n = 40 and p = 0.03. 
(c) X has a Poisson distribution with parameter A = 1.2. 
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2.2. Explain in general terms why it might be plausible to assume that 
the following random variables follow a Poisson distribution: 


(a) The number of customers that enter a store in a fixed time period. 

(b) The number of customers that enter a store and buy something in a 
fixed time period. 

(c) The number of atomic particles in a radioactive mass that disinte- 
grate in a fixed time period. 


2.3. A large number of distinct pairs of socks are in a drawer, all mixed 
up. A small number of individual socks are removed. Explain in general 
terms why it might be plausible to assume that the number of pairs among 
the socks removed might follow a Poisson distribution. 


2.4. Suppose that a book of 600 pages contains a total of 240 typograph- 
ical errors. Develop a Poisson approximation for the probability that three 
particular successive pages are error-free. 


Problems 


2.1. Let X(n, p) have a binomial distribution with parameters n and p. 
Let n — © and p —> 0 in such a way that np = A. Show that 


lim Pr{ X(n, p) = 0} = e* 


and 
Pr{X(n, p) =k + 1} A 


im BrtX(n, p) =k} CURED fork =0O,1,.... 


2.2. Suppose that 100 tags, numbered 1, 2,..., 100, are placed into an 
urn, and 10 tags are drawn successively, with replacement. Let A be the 
event that no tag is drawn twice. Show that 


Pr{A} = (1 — WC — ww) °°: C1 — i) = 0.6282. 
Use the approximation 


l—-x=e" forx ~ 0 
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to get 
Pr{A} ~ exp| — (1 +2 +--+ + 9)} = e°* = 0.6376, 
Interpret this in terms of the law of rare events. 


2.3. Suppose that N pairs of socks are sent to a laundry, where they are 
washed and thoroughly mixed up to create a mass of unmatched socks. 
Then n socks are drawn at random without replacement from the pile. Let 
_A be the event that no pair is among the n socks so selected. Show that 


Use the approximation 
1—xe™* for x ~ 0 


to get 


— i | n(n — »| 
A\ = — Pe —————-, 
Pr{A} exp| 2, ON = | exp an | 


the approximations holding when n is small relative to N, which is large. 

Evaluate the exact expression and each approximation when N = 100 and 

n = 10. Is the approximation here consistent with the actual number of 

pairs of socks among the n socks drawn having a Poisson distribution? 
Answer: Exact 0.7895; Approximate 0.7985. 


2.4. Suppose that N points are uniformly distributed over the interval 
[0, N). Determine the probability distribution for the number of points in 
the interval [0, 1) asN > ©. 


2.5. Suppose that N points are uniformly distributed over the surface of 
a circular disk of radius r. Determine the probability distribution for the 
number of points within a distance of one of the origin as N — ©, r > %, 
Ni(arr’) = X. 


2.6. Certain computer coding systems use randomization to assign 
memory storage locations to account numbers. Suppose that N = MA dif- 
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ferent accounts are to be randomly located among M storage locations. 
Let X, be the number of accounts assigned to the ith location. If the ac- 
counts are distributed independently and each location is equally likely to 
be chosen, show that Pr{X, = k} — e~*A‘/k! as N > ©. Show that X, and 
X, are independent random variables in the limit, for distinct locations 
i # j. In the limit, what fraction of storage locations have two or more ac- 
counts assigned to them? 


2.7. N bacteria are spread independently with uniform distribution on a 
microscope slide of area A. An arbitrary region having area a is selected for 
observation. Determine the probability of k bacteria within the region of 
area a. Show that as N - © and a > 0 such that (a@/A)N — c (0<c< ®), 
then p(k) — e7‘c/K!. 


2.8. Using (2.3), evaluate the exact probabilities for S, and the Poisson 
approximation and error bound in (2.4) when n = 4 and p, = 0.1, p, = 0.2, 
p; = 0.3, and p, = 0.4. 


2.9. Using (2.3), evaluate the exact probabilities for S, and the Poisson 
approximation and error bound in (2.4) when n = 4 and p, = 0.1, p, = 0.1, 
p; = 0.1, and p, = 0.2. 


2.10. Review the proof of Theorem 2.1 in Section 2.2 and establish the 
stronger result 


Pr{S, in} — Pr{X(w) in J}| = O° p? 
k=] 
for any set of nonnegative integers /. 


2.11. Let X and Y be jointly distributed random variables and B an 
arbitrary set. Fill in the details that justify the inequality IPr{X in B} — 
Pr{Y in B}| = Pr{X # Y}. 


Hint: Begin with 
{Xin B} = {XinBandYinB} or {Xin Band Y notin B} 
C{YinB} or {X #Y}. 
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2.12. Computer Challenge Most computers have available a routine 
for simulating a sequence U,, U,,...of independent random variables, 
each uniformly distributed on the interval (0, 1). Plot, say, 10,000 pairs 
(U,,,, U;,,.,) on the unit square. Does the plot look like what you would ex- 
pect? Repeat the experiment several times. Do the points in a fixed num- 
ber of disjoint squares of area 1/10,000 look like independent unit Pois- 
son random variables? 


‘3. Distributions Associated with the Poisson Process 


A Poisson point process N((s, t]) counts the number of events occurring 
in an interval (s, t]. A Poisson counting process, or more simply a Poisson 
process X(t), counts the number of events occurring up to time ¢. Formally, 
X(t) = N((O, t]). 

Poisson events occurring in space can best be modeled as a point 
process. For Poisson events occurring on the positive time axis, whether 
we view them as a Poisson point process or Poisson counting process is 
largely a matter of convenience, and we will freely do both. The two de- 
scriptions are equivalent for Poisson events occurring along a line. The 
Poisson process is the more common and traditional description in this 
case because it allows a pictorial representation as an increasing integer- 
valued random function taking unit steps. 

Figure 3.1 shows a typical sample path of a Poisson process where W, is 
the time of occurrence of the nth event, the so-called waiting time. It is often 
convenient to set W, = 0. The differences $, = W,,, — W, are called sojourn 
times; S, measures the duration that the Poisson process sojourns in state n. 


Wo W Ww W t 


J 5, af — 5, +e 5, fe 


Figure 3.1 A typical sample path of a Poisson process showing the waiting 
times W, and the sojourn times S,,. 
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In this section we will determine a number of probability distributions 
associated with the Poisson process X(f), the waiting times W,, and the so- 
journ times S,. 


Theorem 3.1. The waiting time W,, has the gamma distribution whose 
probability density function is 


Nt! l 


fy () = ———e™, n=1,2,...,t20. (3.1) 
" (n — 1)! 
In particular, W,, the time to the first event, is exponentially distributed: 
fu) =Ae™“, t20. (3.2) 


Proof The event W, < t occurs if and only if there are at least n events 
in the interval (0, ¢], and since the number of events in (0, ¢] has a Poisson 
distribution with mean At we obtain the cumulative distribution function 
of W, via 


Fy (t) = Pr{W, < t} = Pr{X(t) = n} 


7 5 (Atyie™™ 
fo kL? 


We obtain the probability density function f, (¢) by differentiating the cu- 
mulative distribution function. Then 


ft) = = Fy AC) 


= fy = eafh Ae Oy y WOT 
dt |! 2! (n — 1)! 


ca + AD OP a ag pO] 
~enfa + SO |! uy v + 4D! 
reefs 4 OO gg OO) 
|! 2! (n — 1)! 


Nt! 4 
= ——e ", 
(n — 1)! 
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There is an alternative derivation of the density in (3.1) that uses the 
Poisson point process N((s, t]) and proceeds directly without differentia- 
tion. The event t < W, =t + Atcorresponds exactly to n — 1 occurrences 
in (0, ¢] and one in (t, t + Aft], as depicted in Figure 3.2. 


N((O, t]) =n- 1 N((t,t + At))=1 

a 

t [1+Ar 
Ww 


"a 


Figure 3.2 


Then 
fv (At  Pr{t << W, st + At} + o(Ad) [see I, (2.5)] 

= Pr{N((0, t]) = n — 1} Pr{N((t, t + At]) = 1} + o(Ad) 

_ (At)"'e™ 


n= 1! A(At) + o(At). 


Dividing by At and passing to the limit as At - 0 we obtain (3.2). 
Observe that Pr{M((t, t + At]) = 1} = Pr{N((t, t + At]) = 1} + o(At) = 
(At) + o(At). O 


Theorem 3.2 The sojourn times S,, S,,..., 5,-, are independent ran- 

dom variables, each having the exponential probability density function 
fs,(s) = Ae*, s20. | (3.3) 

Proof We are being asked to show that the joint probability density 


function of So, S,, ..., 5,-, is the product of the exponential densities 
given by 


fo0.5,.005, {Sor Sts oe ey Sy) = (Ae )(Aen*) ++ (Ae). (3.4) 


We give the proof only in the case n = 2, the general case being entirely 
similar. Referring to Figure 3.3 we see that the joint occurrence of 


s,<$,<s,+ As, and s,<$,< 5, + As, 
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5; ——>| As, |< $. ——> As, x— 


Figure 3.3 


corresponds to no events in the intervals (0, s,] and (s, + As,, 5, + As, + s,] 
and exactly one event in each of the intervals (s,, s, + As,] and (s, + As, + 
S55 Ay + As, + S> + As,]. Thus 


Fo,.s(Sis 52) As, As, = Pr{s, < S, <s, + As,, 55 < S, << s, + As,} 

+ o(As, As,) 

= Pr{M((0, s,]) = 0} 
x Pr{N((s, + As,, 5s, + As, + 5,]) = 0} 
x Pr{N((s,, 5, + As,]) = 1} 
x Pr{N((s, + As, + 5,, 5, + As, + s, + As,]) = 1} 
+ o(As, As,) 

= e Mie Me74 Se744%)A(As,)A(AS,) + o(As, As.) 

= (Ae~*")(Ae*)(As,)(As,) + o(As, As,). 


Upon dividing both sides by (As,)(As,) and passing to the limit as As, — 0 
and As, —> 0, we obtain (3.4) in the case n = 2. 

The binomial distribution also arises in the context of Poisson 
processes. 


Theorem 3.3 Let {X(1)} be a Poisson process of rate } > 0. Then for 
O<u<tand0OsSke=n, 


Pr{X(u) = X(t) = n} = Tele - ay (3.5) 
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Proof Straightforward computations give 


Pr{X(u) = k and X(t) = n} 
Pr{X(t) = n} 


Pr{X(u) = k|X(t) = n} 


Pr{X(u) = k and X(t) — X(u) = n — k} 


PriX() =n) 
7 o™(AD"In! 
LC) 
Ain — &)! th 


which establishes (3.5). 


Exercises 


3.1. A radioactive source emits particles according to a Poisson process 
of rate A = 2 particles per minute. What is the probability that the first par- 
ticle appears after three minutes? 


3.2. A radioactive source emits particles according to a Poisson process 
of rate A = 2 particles per minute. 


(a) What is the probability that the first particle appears some time 
after three minutes but before five minutes? 

(b) What is the probability that exactly one particle is emitted in the in- 
terval from three to five minutes? 


3.3. Customers enter a store according to a Poisson process of rate A = 6 
per hour. Suppose it is known that but a single customer entered during 
the first hour. What is the conditional probability that this person entered 
during the first fifteen minutes? 


3.4. Let X(‘) be a Poisson process of rate € = 3 per hour. Find the con- 
ditional probability that there were two events in the first hour, given that 
there were five events in the first three hours. 
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3.5. Let X(t) be a Poisson process of rate @ per hour. Find the conditional 
probability that there were m events in the first ¢ hours, given that there 
were n events in the first T hours. Assume 0 = m =n andO0<t<T. 


3.6. Fori = 1,..., n, let {X,(0; t = 0} be independent Poisson 
processes, each with the same parameter A. Find the distribution of the 
first time that at least one event has occurred in every process. 


3.7. Customers arrive at a service facility according to a Poisson 
process of rate A customers/hour. Let X(t) be the number of customers that 
have arrived up to time ¢. Let W, W,, . . . be the successive arrival times 
of the customers. Determine the conditional mean E(W.|X(t) = 3]. 


3.8. Customers arrive at a service facility according to a Poisson 
process of rate A = 5 per hour. Given that 12 customers arrived during the 
first two hours of service, what is the conditional probability that 5 cus- 
tomers arrived during the first hour? 


3.9. Let X(t) be a Poisson process of rate A. Determine the cumulative 
distribution function of the gamma density as a sum of Poisson probabil- 
ities by first verifying and then using the identity W, = r if and only if 
X(t) =r. 


Problems 


3.1. Let X(t) be a Poisson process of rate A. Validate the identity 
{W, > w,, W, > w} 

if and only if 

{X(w,) = 0, X(w.) — X(w,) = O or 1}. 
Use this to determine the joint upper tail probability 
Pr{W, > w,, W, > w,} = Pr{X(w,) = 0, X(w,) — X(w,) = Oor 1} 

=e “(1 + AW, — w,)Jerwr"”. 

Finally, differentiate twice to obtain the joint density function 


f(w,, w,) = M exp{—Aw,} for0 < w, < wy. 
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3.2. The joint probability density function for the waiting times W, and 
W, is given by 


f(w,, W,) = A* exp{—Aw,} for 0 < w, < w,. 


Determine the conditional probability density function for W,, given that 
W, = w,. How does this result differ from that in Theorem 3.3 when 
n=2andk = 1? 


3.3. The joint probability density function for the waiting times W, and 
W, is given by 
f(,, w,) =  exp{—Aw,} for 0 < w, < w,. 
Change variables according to 
S =W, and §,=W,-W 


and determine the joint distribution of the first two sojourn times. Com- 
pare with Theorem 3.2. 


3.4. The joint probability density function for the waiting times W, and 
W, is given by 
f(w,, w.) = ’ exp{—Aw,} for 0 < w, < w,. 


Determine the marginal density functions for W, and W,, and check your 
work by comparison with Theorem 3.1. 


3.5. Let X(t) be a Poisson process with parameter A. Independently, let 
T be a random variable with the exponential density 


| fr(t) = de" fort > 0. 
Determine the probability mass function for X(T). 


Hint: Use the law of total probability and I, (6.4). Alternatively, use 
the results of I, Section 5.2. 


3.6. Customers arrive at a holding facility at random according to a 
Poisson process having rate A. The facility processes in batches of size Q. 
That is, the first Q — 1 customers wait until the arrival of the Qth cus- 
tomer. Then all are passed simultaneously, and the process repeats. Ser- 
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vice times are instantaneous. Let M(t) be the number of customers in the 
holding facility at time t. Assume that N(O) = 0 and let T = min{t = 0: 
N(t) = Q} be the first dispatch time. Show that E[T] = Q/A and E[J§ N(t) 
dj=[1+2+-°--+(Q- DVA= QQ — 1)/2d. 


3.7. Accritical component on a submarine has an operating lifetime that 
is exponentially distributed with mean 0.50 years. As soon as a component 
fails, it is replaced by a new one having statistically identical properties. 
What is the smallest number of spare components that the submarine 
should stock if it is leaving for a one-year tour and wishes the probability 
of having an inoperable unit caused by failures exceeding the spare in- 
ventory to be less than 0.02? 


3.8. Consider a Poisson process with parameter A. Given that X(t) = n 
events occur in time ¢, find the density function for W,, the time of occur- 
rence of the rth event. Assume that r = n. 


3.9. The following calculations arise in certain highly simplified mod- 
els of learning processes. Let X(t) and X,(t) be independent Poisson 
processes having parameters A, and A,, respectively. 


(a) What is the probability that X,(t) = 1 before X,(¢) = 1? 
(b) What is the probability that X,(¢) = 2 before X,(t) = 2? 


3.10. Let {W,} be the sequence of waiting times in a Poisson process 
of intensity A = 1. Show that X, = 2” exp{—W,} defines a nonnegative 
martingale. | 


4. The Uniform Distribution and Poisson Processes 


The major result of this section, Theorem 4.1, provides an important tool 
for computing certain functionals on a Poisson process. It asserts that, 
conditioned on a fixed total number of events in an interval, the locations 
of those events are uniformly distributed in a certain way. 

After a complete discussion of the theorem and its proof, its application 
in a wide range of problems will be given. 

In order to completely understand the theorem, consider first the fol- 
lowing experiment. We begin with a line segment f units long and a fixed 
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number n of darts and throw darts at the line segment in such a way that 
each dart’s position upon landing is uniformly distributed along the seg- 
ment, independent of the location of the other darts. Let U, be the position 
of the first dart thrown, U, the position of the second, and so on up to U.. 
The probability density function is the uniform density 


I forO Su St, 
Sy) = jl 
0 elsewhere. 


Now let W, = W, = ::: S W, denote these same positions, not in the order 
in which the darts were thrown, but instead in the order in which they 
appear along the line. Figure 4.1 depicts a typical relation between 
U,, U,,..., U, and W, W,,..., W, 


ae 


The joint probability density function for W,, W,,..., W, is 
F,..w (Wi, 6 W,) = nie” for0<w,<w,<-:'<w,St. (41) 
For example, to establish (4.1) in the case n = 2 we have 
F,.w.W1, W2) Aw, Aw, 
= Priw, << W, sw, + Aw, w, < W, Sw, + Aw,} 
= Pr{w, < U, = w, + Aw, w, < U, < w, + Aw,} 


+ Pr{w, < U, Sw, + Aw,, w, < U, = w, + Aw)} 


Dividing by Aw, Aw, and passing to the limit gives (4.1). When n = 2, 
there are two ways that U, and U, can be ordered; either U, is less. than U5, 
or U, is less than U,. In general there are n! arrangements of U,,..., U, 
that lead to the same ordered values W, = --- S W,, thus giving (4.1). 


U; U,, U ] U; U,, - | 


0 W, WwW, W; W,-, WwW 


a 


t 


Figure 4.1. W,, W,,..., W, are the values U,, U,,..., U, arranged in 
increasing order. 
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Theorem 4.1 Let W, W,, ... be the occurrence times in a Poisson 
process of rate X > 0. Conditioned on N(t) = n, the random variables 
W,, W,,..-, W, have the joint probability density function 


Five, xn=nW1, ne) W,) — nit for 0 < W, <cees W,, St. (4.2) 
Proof The event w, < W=w, + Aw,fori=1,...,nand Mt) =n 
corresponds to no events occurring in any of the intervals (0, w,], (w, + 
Aw,, W2],...,(w,-, + Aw,-_., w,], (w, + Aw,, t], and exactly one event in 


each of the intervals (w,, w, + Aw,], (w., w, + Aw,],..., (w,, w, + Aw,]. 
These intervals are disjoint, and 


Pr{N((O, w,]) = 0,..., M((w, + Aw,, t]) = 0} 
— eA 1e7 Arg 4 7 Aw) wee Ee AO 1 DW AGH, Aw) 


— e [eran + tau, )] 


= e [1 + o(max{Aw,})], 
while 
Pr{N((w,, w, + Aw,]) = 1,..., N((w,, w, + Aw,]) = 1} 
= \(Aw,) --: A(Aw,)[1 + o(max{Aw,})]. 
Thus 
FiviencM, XnenWiy ++ +s W,) Aw, +++ Aw, 
= Pr{w, << W, = w, + Aw,,...,w,<Wsw,t Aw,|N(t) =n} 
+ o(Aw, --: Aw,) 
Pr{w, < W, = w, + Aw, i= 1,...,n, M(t) = n} 
Pr{M(t) = n} 
+ o(Aw, ::: Aw,) 


_ e*A(Aw) AAW) 
= > ™(AD Int [1 + o(max{Aw,})] 


nit-"(Aw,) --+ (Aw,)[1 + o(max{Aw,})]. 


Dividing both sides by (Aw,) --: (Aw,) and letting Aw, > 0,..., Aw, > 0 
establishes (4.2). 
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Theorem 4.1 has important applications in evaluating certain symmet- 
ric functionals on Poisson processes. Some sample instances follow. 


Example Customers arrive at a facility according to a Poisson process 
of rate A. Each customer pays $1 on arrival, and it is desired to evaluate 
the expected value of the total sum collected during the interval (0, f] dis- 
counted back to time 0. This quantity is given by 


X(t) 
M = E| >. com 
k=1 


where B is the discount rate, W,, W,, . . . are the arrival times, and X(t) is 
the total number of arrivals in (0, t]. The process is shown in Figure 4.2. 


Present Value 


0 W, W, W, t 


Figure 4.2 A dollar received at time W, is discounted to a present value at 
time 0 of exp{— BW, }. 


We evaluate the mean total discounted sum M by conditioning on 
X(t) = n. Then 


x 


M=)> EL > eX) = n| Pr{X(t) = n}. (4.3) 


n=] 


Let U,, ..., U, denote independent random variables that are uniformly 
distributed in (0, ¢]. Because of the symmetry of the functional 
>.-, exp{— BW,} and Theorem 4.1, we have 
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z| Ss eM X(t) = n| = F| S em 
k=1 k=] 
= nE[ee“] 
= nt"! | eF" du 
0 
n —B 
= ail — ey), 


Substitution into (4.3) then gives 


-M= ty — eB] >» n Pr{X(t) = n} 
n=] 


Bt 

I —Bt 
= prt — ePJE[X())] 
= ll — @ FF], 


Example Viewing a fixed mass of a certain radioactive material, sup- 
pose that alpha particles appear in time according to a Poisson process of 
intensity A. Each particle exists for a random duration and is then annihi- 
lated. Suppose that the successive lifetimes Y,, Y,, . . . of distinct particles 
are independent random variables having the common distribution func- 
tion G(y) = Pr{Y, = y}. Let M(t) count the number of alpha particles ex- 
isting at time t. The process is depicted in Figure 4.3. 


0 W, W, W.. _ W.. t 
Figure 4.3 A particle created at time W, = r still exists at time rif W, + Y, =. 
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We will use Theorem 4.1 to evaluate the probability distribution of M(1) 
under the condition that M(0) = 0. 

Let X(t) be the number of particles created up to time t, by assumption 
a Poisson process of intensity A. Observe that M(t) = X(t); the number of 
existing particles cannot exceed the number of particles created. Condi- 
tion on X(t) = n and let W,,..., W,, = t be the times of particle creation. 
Then particle k exists at time ¢ if and only if W, + Y, = ¢. Let 


l ifWi+ Y,=t, 


1Wt+Y=t -| 
Mm +h = 0 fWt+Y¥<t. 


Then 1{W, + Y, 2 t} = 1 if and only if the Ath particle is alive at time t. 
Thus 


Pr{(M(t) = m| X(t) = n} = Pr{ 1Wet+ Y¥,=t} =mXo = nt. 
k=1 
Invoking Theorem 4.1 and the symmetry among particles, we have 


Pr{ UW, + ¥,= 2} = mx = nt 
k=) 


(4.4) 
= Pr} 1, + ¥,=1} = m| 


k=] 


where U,, U,,..., U, are independent and uniformly distributed on (0, Z]. 
The right-hand side of (4.4) is readily recognized as the binomial distrib- 
ution in which 


1 | 
p=PrU+ 20) =— [Pry =r — u} du 
0 
1 | 
=— [U1 — G(t — u)] du (4.5) 
0 


1 t 
=; Ju ~ G2) az. 


Thus, explicitly writing the binomial distribution, we have 
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Pr{M(t) = m|X(1) = n} = ani pn — pyr”, 


with p given by (4.5). Finally, 
Pr{M(t) = m} = > Pr{M(@) = m|X(f) = n} Pr{X() = n} 


w= 


n! _ (Aty'e™ 
= >» ————— pl — p)' m7 (4.6) 
n=m mi(n — m)! n! 


m! n=m (n — m)! 


The infinite sum is an exponential series and reduces according to 


x (1 _ py" ™Aty" _ S [Ard — p)y _ oMl=p 
n=m (n — m)! j=0 y} 
and this simplifies (4.6) to 
—Apt Xr ty” 
Pr(M(t) = m} = AP" form=0,1,.... 
Mm. 


In words, the number of particles existing at time t has a Poisson distrib- 
ution with mean 


Apt =A | [1 — G(y)] dy. (4.7) 
0) 


It is often relevant to let t + © in (4.7) and determine the correspond- 
ing long run distribution. Let uw = E[Y,] = Jo [1 — G(y)] dy be the mean 
lifetime of an alpha particle. It is immediate from (4.7) that as t > «, the 
distribution of M(t) converges to the Poisson distribution with parameter 
Ap. A great simplification has taken place. In the long run, the probability 
distribution for existing particles depends only on the mean lifetime p, 
and not otherwise on the lifetime distribution G(y). In practical terms this 
statement implies that in order to apply this model, only the mean lifetime 
bt need be known. 
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The shot noise process is a model for fluctuations in electrical currents 
that are due to chance arrivals of electrons to an anode. Variants of the 
phenomenon arise in physics and communication engineering. Assume: 


(1) Electrons arrive at an anode according to a Poisson process 
{X(t); t = 0} of constant rate A; 

(2) An arriving electron produces a current whose intensity x time 
units after arrival is given by the impulse response function h(x). 


The intensity of the current at time fis, then, the shot noise 
X(t) 


I(t) = >) h(t — W,), (4.8) 
k=1 


where W,, W, are the arrival times of the electrons. 
Common impulse response functions include triangles, rectangles, and 
decaying exponentials of the form 
h(x) = e7*, x> 0, 
where @ > Q is a parameter, and power law shot noise for which 
h(x) = x9, for x > 0. 


We will show that for a fixed time point t, the shot noise /(t) has the same 
probability distribution as a certain random sum that we now describe. In- 
dependent of the Poisson process X(t), let U,, U;, . . . be independent ran- 
dom variables, each uniformly distributed over the interval (0, t], and de- 


fine €, = A(U,) for k = 1, 2, .... The claim is that /(t) has the same 
probability distribution as the random sum 
Sit) = €& tere + Ej. (4.9) 


With this result in hand, the mean, variance, and distribution of the shot 
noise /(t) may be readily obtained using the results on random sums de- 
veloped in II, Section 3. II, (3.9), for example, immediately gives us 


E({it)) = E[S@] = AtE[(hW,)] = A | h(u) du 
0 
and 


Var[I(t)] = At{Var[h(U,)] + E[AU))} 


= AtE[h(U,)] =A Io; du. 
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In order to establish that the shot noise /(t) and the random sum S(f) share 
the same probability distribution, we need to show that Pr{/(t) = x} = 
Pr{ S(t) = x} fora fixed t > 0. Begin with 


X(t) 
Pr{i(t) =x} = Pr| >» ht-W)s x} 
k=1 
x X(t) 
=> Pr{ >) h(t — W,) = |X) = nt Pr{X(t) = n} 
k=l 


= > Pr S A(t — W,) =xX() = nt Pr{X(t) = n}, 
n=0 


and now invoking Theorem 4.1, 


= >» Pr| S h(tt-U,)s x Pr{X(t) = n} 
k=1 


n=0 


= 5 Pr{ > hU,) = x} Pr{X(t) = n} 
k=] 


n=0 


(because U, and t — U, have the same distribution) 


x 


= >) Prfe, + +++ + €, Sx} Pr{X() = 1} 
n=0 


Pri{e + Ey) = x} 


{€, 
= Pr{S(t) < x}, 


which completes the claim. 


4.2. Sum Quota Sampling 


A common procedure in statistical inference is to observe a fixed number 
n of independent and identically distributed random variables X,, .. ., X,, 
and use their sample mean 


— X,t---+X 
X, = 
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as an estimate of the population mean or expected value E[X,]. But sup- 
pose we are asked to advise an airline that wishes to estimate the failure 
rate in service of a particular component, or, what is nearly the same thing, 
to estimate the mean service life of the part. The airline monitors a new 
plane for two years and observes that the original component lasted 7 
months before failing. Its replacement lasted 5 months, and the third com- 
ponent lasted 9 months. No further failures were observed during the re- 
maining 3 months of the observation period. Is it correct to estimate the 
mean life in service as the observed average (7 + 5 + 9)/3 = 7 months? 

This airline scenario provides a realistic example of a situation in which 
the sample size is not fixed in advance but is determined by a preassigned 
quota t > 0. In sum quota sampling, a sequence of independent and iden- 
tically distributed nonnegative random variables X,, X,, . . . 1s observed se- 
quentially, the sampling continuing as long as the sum of the observations 
is less than the quota r. Let this random sample size be denoted by M(t). 
Formally, 


N(t) = max{n = 0; X, +--+: + X, < ¢}. 
The sample mean is 
Wain _ X, ttt + Xu 


An = N(t) 

Of course it is possible that X, = t, and then M(t) = O, and the sample 
mean is undefined. Thus, we must assume, or condition on, the event that 
N(t) = 1. An important question in statistical theory is whether or not this 
sample mean is unbiased. That is, how does the expected value of this 
sample mean relate to the expected value of, say, X,? 

In general, the determination of the expected value of the sample mean 
under sum quota sampling is very difficult. It can be carried out, however, 
in the special case in which the individual X summands are exponentially 
distributed with common parameter A, so that M(t) is a Poisson process. 
One hopes that the results in the special case will shed some light on the 
behavior of the sample mean under other distributions. 

The key is the use of Theorem 4.1 to evaluate the conditional 
expectation 


E(Wy.,/N(t) = n] 


E{max{U,,..., U,}] 


al 
t , 
nt 1 
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where U,, .. ., U, are independent and uniformly distributed over the 
interval (0, t]. Note also that 
(At)"e7*! 
= t) > O = —_—__., 
Pr{N(t) = n|N(t) > 0} ale) 
Then 
Win -F> Aly = _ 
E mn ho >O/ =) E (NG) = | PriM(e) = n|N(t) > 0} 
n=] 


2 altima =) 
ema) 2, Gee pr 


= ~( ie" 1- An 


er’ — ] 


~( At 
=—(l] - 
A e’ — ] 


We can perhaps more clearly see the effect of the sum quota sampling if 
we express the preceding calculation in terms of the ratio of the bias to the 
true mean E[X,] = 1/A. We then have 

EIX)) ~ EX) At_—_ EIN) 


E[X,] et 1 ee =] 
The left side is the fraction of bias, and the right side expresses this frac- 
tion bias as a function of the expected sample size under sum quota sam- 
pling. The following table relates some values: 


Fraction Bias E[N(t)] 


0.58 ] 
0.31 2 
0.16 3 
0.17 4 
0.03 5 
0.015 6 
0.0005 10 
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In the airline example, we observed M(t) = 3 failures in the two-year 
period, and upon consulting the above table, we might estimate the frac- 
tion bias to be something on the order of —16%. Since we observed 
Xun = 7, a More accurate estimate of the mean time between failures 
(MTBF = E[X,]) might be 7/.84 = 8.33, an estimate that attempts to cor- 
rect, at least on average, for the bias due to the sampling method. 

Looking once again at the table, we may conclude in general, that the 
bias due to sum quota sampling can be made acceptably small by choos- 
ing the quota ¢ sufficiently large so that, on average, the sample size so se- 
lected is reasonably large. If the individual observations are exponentially 
distributed, the bias can be kept within 0.05% of the true value, provided 
that the quota ¢ is large enough to give an average sample size of 10 or 
more. 


Exercises 


4.1. Let {X(‘); t 2 0} be a Poisson process of rate A. Suppose it is 
known that X(1) = n. For n = 1, 2,..., determine the mean of the first 
arrival time W.. 


4.2. Let {X(t); t = 0} be a Poisson process of rate A. Suppose it is 
known that X(1) = 2. Determine the mean of W, W,, the product of the first 
two arrival times. 


4.3. Customers arrive at a certain facility according to a Poisson process 
of rate A. Suppose that it is known that five customers arrived in the first 
hour. Determine the mean total waiting time E[W, + W, + --- + W,]. 


4.4. Customers arrive at a service facility according to a Poisson 
process of intensity A. The service times Y,, Y,, .. . of the arriving cus- 
tomers are independent random variables having the common probability 
distribution function G(y) = Pr{Y, = y}. Assume that there is no limit to 
the number of customers that can be serviced simultaneously; 1.e., there is 
an infinite number of servers available. Let M(t) count the number of cus- 
tomers in the system at time ¢. Argue that M(t) has a Poisson distribution 
with mean Apt, where 


p=r'[-Gonlay. 
0 
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4.5. Customers arrive at a certain facility according to a Poisson process 
of rate A. Suppose that it is known that five customers arrived in the first 
hour. Each customer spends a time in the store that is a random variable, 
exponentially distributed with parameter a and independent of the other 
customer times, and then departs. What is the probability that the store is 
empty at the end of this first hour? 


Problems 


4.1. Let W, W,.. . be the event times in a Poisson process {X(t); 
t = 0} of rate A. Suppose it is known that X(1) = n. For k < n, what is the 
conditional density function of W,..., W._,, W..,,..., W,, given that 
W, = w? 


4.2. Let {N(2); t = 0} be a Poisson process of rate A, representing the ar- 
rival process of customers entering a store. Each customer spends a dura- 
tion in the store that is a random variable with cumulative distribution 
function G. The customer durations are independent of each other and of 
the arrival process. Let X(t) denote the number of customers remaining in 
“ the store at time t, and let Y(t) be the number of customers who have ar- 
rived and departed by time ¢. Determine the joint distribution of X(t) and 
Y(t). _ 
4.3. Let W, W,, ... be the waiting times in a Poisson process {X(t); 
t = O} of rate A. Under the condition that X(1) = 3, determine the joint 
distribution of U = W/W, and V = (1 — W,)/(1 — W,). 


4.4. Let W, W,, ... be the waiting times in a Poisson process {X(t); 
t = 0} of rate A. Independent of the process, let Z,, Z,, . . . be independent 
and identically distributed random variables with common probability 
density function f(x), 0 < x < %. Determine Pr{Z > z}, where 


Z = min{W, + Z,, W, + Z,,...}. 
4.5. Let W, W, ... be the waiting times in a Poisson process {N(t); 


t = 0} of rate A. Determine the limiting distribution of W,, under the con- 
dition that M(t) = nas n > © and t > & in such a way that n/t = B > 0. 
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4.6. Customers arrive at a service facility according to a Poisson 
process of rate A customers/hour. Let X(t) be the number of customers that 
have arrived up to time ¢. Let W, W,, .. . be the successive arrival times 
of the customers. 


(a) Determine the conditional mean E[W,|X(t) = 2]. 

(b) Determine the conditional mean E[W,|X(t) = 5]. 

(c) Determine the conditional probability density function for W,, 
given that X(t) = 5. 


4.7. Let W, W,, ... be the event times in a Poisson process {X(A); 
t = 0} of rate A, and let f(w) be an arbitrary function. Verify that 


X(t) 


ELD, fOW)] = A | flow) aw, 
i=] rs 


4.8. Electrical pulses with independent and identically distributed 
random amplitudes é,, &, .. . arrive at a detector at random times W, 
W,, .. . according to a Poisson process of rate A. The detector output 6,(2) 
for the kth pulse at time f¢ is 


60) = ‘ fort < W,, 
= é, exp{—a(t — W,)} for t = W,. 


That is, the amplitude impressed on the detector when the pulse arrives is 
€,, and its effect thereafter decays exponentially at rate a. Assume that the 
detector is additive, so that if M(t) pulses arrive during the time interval 
[O, t], then the output at time f¢ is 

N(t) 


Z(t) = >. 6,(). 
k= 1 


Determine the mean output E[Z(t)] assuming N(O) = 0. Assume that the 
amplitudes &,, €,, . . . are independent of the arrival times W,, W,,.... 


4.9. Customers arrive at a service facility according to a Poisson process 
of rate A customers per hour. Let M(t) be the number of customers that 
have arrived up to time ft, and let W,, W,, .. . be the successive arrival times 
of the customers. Determine the expected value of the product of the wait- 
ing times up to time ¢. (Assume that W,W, --- W,,,, = 1 when M(t) = 0.) 
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4.10. Compare and contrast the example immediately following Theo- 
rem 4.1, the shot noise process of Section 4.1, and the model of Problem 
4.8. Can you formulate a general process of which these three examples 
are special cases? 


4.11. Computer Challenge Let Uy), U, ... be independent random 
variables, each uniformly distributed on the interval (0, 1). Define a sto- 
chastic process {S,} recursively by setting 


S=0 and S,, =U + S,) forn > 0. 


(This is an example of a discrete-time, continuous-state, Markov process.) 
When n becomes large, the distribution of S, approaches that of a random 
variable S = S,,, and S must have the same probability distribution as 
U(1 + S), where U and S are independent. We write this in the form 


I) 
S= U(1 + S), 


from which it is easy to determine that E[S] = 1, Var[S] = 3, and even (the 
Laplace transform) 


t) 
1 — gg 
E[e®] = exe | . iu} g> 0. 
u 


0+ 
The probability density function f(s) satisfies 

f(s) =O for sO, and 

G_i 

ds s 

What is the 99th percentile of the distribution of S? (Note: Consider the shot 

noise process of Section 4.1. When the Poisson process has rate A = 1 and 


the impulse response function 1s the exponential h(x) = exp{—x}, then the 
shot noise /(t) has, in the limit for large t, the same distribution as S.) 


f(s — 1), for s > 0. 


9. Spatial Poisson Processes 


In this section we define some versions of multidimensional Poisson 
processes and describe some examples and applications. 
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Let S be a set in n-dimensional space and let 4 be a family of subsets 
of S. A point process in S is a stochastic process N(A) indexed by the sets 
A in & and having the set of nonnegative integers {0, 1, 2, . . .} as its pos- 
sible values. We think of “points” being scattered over S in some random 
manner and of M(A) as counting the number of points in the set A. Because 
N(A) is a counting function, there are certain obvious requirements that it 
must satifsy. For example, if A and B are disjoint sets in 4 whose union 
A U Bis also in &, then it must be that M(A U B) = NA) + MB). In 
words, the number of points in A or B equals the number of points in A 
plus the number of points in B when A and B are disjoint. 

The one-dimensional case, in which S is the positive half line and 4 
comprises all intervals of the form A = (s, ¢], for 0 = s < t, was intro- 
duced in Section 3. The straightforward generalization to the plane and 
three-dimensional space that is now being discussed has relevance when 
we consider the spatial distribution of stars or galaxies in astronomy, of 
plants or animals in ecology, of bacteria on a slide in medicine, and of de- 
fects on a surface or in a volume in reliability engineering. 

Let S be a subset of the real line, two-dimensional plane, or three- 
dimensional space; let 4 be the family of subsets of § and for any set A 
in &; let IA denote the size (length, area, or volume, respectively) of A. 
Then {MA); A in 4} is a homogeneous Poisson point process of intensity 
A> Oif 


(i) for each A in &, the random variable N(A) has a Poisson distribu- 
tion with parameter AIA|; 

(i1) for every finite collection {A,,...,A,} of disjoint subsets of S, the 
random variables M(A,), ..., N(A,,) are independent. 


In Section 2, the law of rare events was invoked to derive the Poisson 
process as a consequence of certain physically plausible postulates. This 
implication serves to justify the Poisson process as a model in those situ- 
ations where the postualtes may be expected to hold. An analogous result 
is available in the multidimensional case at hand. Given an arbitrary point 
process {N(A); A in 4}, the required postulates are as follows: 


(1) The possible values for M(A) are the nonnegative integers 
{0, 1,2,...} and 0 < Pr{N(A) = 0} < 1 if 0 < |A| < &. 
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(2) The probability distribution of M(A) depends on the set A only 
through its size (length, area, or volume) |A|, with the further prop- 
erty that Pr{N(A) = 1} = AlA| + o(/A)) as |A| 1 0. 

(3) Form = 2,3,...,ifA,, A:,...,A,, are disjoint regions, then N(A,), 
N(A,), ..., M(A,,) are independent random variables and NM(A, U 
A, U::: UA,) = MA,) + MA.) + -°* + NOA,,). 

(4) 


Pr{N(A) = 1} _ 
450 Pr{N(A) = 1} 


The motivation and interpretation of these postulates is quite evident. 
Postulate 2 asserts that the probability distribution of N(A) does not de- 
pend on the shape or location of A, but only on its size. Postulate 3 re- 
quires that the outcome in one region not influence or be influenced by the 
outcome in a second region that does not overlap the first. Postulate 4 pre- 
cludes the possibility of two points occupying the same location. 

If a random point process M(A) defined with respect to subsets A of 
Euclidean n-space satisfies Postulates 1 through 4, then M(A) is a homo- 
geneous Poisson point process of intensity A > 0, and 


e (Al Al)é 


Pr{N(A) = k} = 7 


fork =0,1,.... (5.1) 

As in the one-dimensional case, homogeneous Poisson point processes 
in n-dimensions are highly amenable to analysis, and many results are 
known for them. We elaborate a few of these consequences next, begin- 
ning with the uniform distribution of a single point. Consider a region A 
of positive size |A| > 0, and suppose it is known that A contains exactly 
One point; 1.e., N(A) = 1. Where in A is this point located? We claim that 
the point is uniformly distributed in the sense that 


Pr{N(B) = 1|N(A) =]}= F for any set BCA. (5.2) 
In words, the probability of the point being in any subset B of A is pro- 
portional to the size of B; that is, the point is uniformly distributed in A. 
The uniform distribution expressed in (5.2) is an immediate consequence 
of elementary conditional probability manipulations. We write A = B U C, 
where B is an arbitrary subset of A and C is the portion of A not included 
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in B. Then B and C are disjoint, so that M(B) and N(C) are independent 
Poisson random variables with respective means A|B| and A|C|. Then 
Pr{M(B) = 1, M(C) = 0} 

Pr{M(A) = 1} 


Pr{N(B) = 1|N(A) = 1} 
N Bie # er © 
A\Alje~*4 


= + (because |B| + |Cj = |A 


), 


and the proof is complete. 

The generalization to n points in a region A is stated as follows. Consider 
a set A of positive size Al > 0 and containing M(A) = n = 1 points. Then 
these n points are independent and uniformly distributed in A in the sense 


that for any disjoint partition A,,...,A,, of A, where A, U--- UA, =A, 
and any positive integers k,,...,k,, where k, + --: + k,, =n, we have 
Pr{N(A,) = k,, see 9 N(A,,) = k,, N(A) = n} (5.3) 


Equation (5.3) expresses the multinomial distribution for the conditional 
distribution of M(A,),..., M(A,,) given that M(A) = n. 


Example An Application in Astronomy Consider stars distributed in 
space in accordance with a three-dimensional Poisson point process of in- 
tensity A > 0. Let x and y designate general three-dimensional. vectors, 
and assume that the light intensity exerted at x by a star located at y is 
f(% y, a) = alix — y? = al[(x, — y,)? + Ge — yn)! + (ts — y,)*], where 
q@ is arandom parameter depending on the intensity of the star at y. We as- 
sume that the intensities @ associated with different stars are independent, 
identically distributed random variables possessing a common mean yp, 
and variance o;. We also assume that the combined intensity exerted at the 
point x due to light created by different stars accumulates additively. Let 
Z(x, A) denote the total light intensity at the point x due to signals ema- 
nating from all sources located in region A. Then 
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N(A) 


Z(x,A) = >. f(x,y, a) 
r=] 
(5.4) 


MAD @ 


7 r=1 Ix ~ yp’ 


where y, is the location of the rth star in A. We recognize (5.4) as a ran- 
dom sum, as discussed in II, Section 3.2. Accordingly, we have the mean 
intensity at x given by 

E[Z(x, A)] = (E[MA))(ET SCX y, @))). (5.5) 


Note that E[N(A)] = A\A 
independent, 7 


, while because we have assumed a and y to be 


EL f(x y, a)] = Efa]E[]x — y}?]. 


But as a consequence of the Poisson distribution of stars in space, we may 
take y to be uniformly distributed in A. Thus 


E(|x — y[?] = a ix-yP an 


With yw, = Efa], then (5.5) reduces to 


d 
E[Z(x, A)] = Au, | oF 


»° 


J ix — yP 


Exercises 


9.1. Bacteria are distributed throughout a volume of liquid according to 
a Poisson process of intensity 6 = 0.6 organisms per mm’. A measuring 
device counts the number of bacteria in a 10 mm’ volume of the liquid. 
What is the probability that more than two bacteria are in this measured 
volume? 


9.2. Customer arrivals at a certain service facility follow a Poisson 
process of unknown rate. Suppose it is known that 12 customers have ar- 
rived during the first three hours. Let N, be the number of customers who 
arrive during the ith hour, i = 1, 2, 3. Determine the probability that 
N, = 3, N, = 4, and N, = 5. 
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5.3. Defects (air bubbles, contaminants, chips) occur over the surface of 
a varnished tabletop according to a Poisson process at a mean rate of one 
defect per top. If two inspectors each check separate halves of a given 
table, what is the probability that both inspectors find defects? 


Problems 


5.1. A piece of a fibrous composite material is sliced across its circular 
cross section of radius R, revealing fiber ends distributed across the circu- 
lar area according to a Poisson process of rate 100 fibers per cross section. 
The locations of the fibers are measured, and the radial distance of each 
fiber from the center of the circle is computed. What is the probability 
density function of this radial distance X for a randomly chosen fiber? 


5.2. Points are placed on the surface of a circular disk of radius one ac- 
cording to the following scheme. First, a Poisson distributed random vari- 
able N is observed. If N = n, then n random variables 6,,..., 6, are in- 
dependently generated, each uniformly distributed over the interval 
[0, 277), and n random variables R,, ..., R, are independently generated, 
each with the triangular density f(r) = 2r,0 <r< 1. Finally, the points 
are located at the positions with polar coordinates (R,, 6,),i = 1,..., 7. 
What is the distribution of the resulting point process on the disk? 


5.3. Let {N(A);A € R’} be a homogeneous Poisson point process in the 
plane, where the intensity is A. Divide the (0, rt] X (0, f] square into n’ 
boxes of side length d = t/n. Suppose there is a reaction between two or 
more points whenever they are located within the same box. Determine 
the distribution for the number of reactions, valid in the limit as t > © and 
d — Qin such a way that td > p > 0. 


5.4. Consider spheres in three-dimensional space with centers distrib- 
uted according to a Poisson distribution with parameter A|Al, where |A| 
now represents the volume of the set A. If the radii of all spheres are dis- 
tributed according to F(r) with density f(r) and finite third moment, show 
that the number of spheres that cover a point t is a Poisson random vari- 
able with parameter {A 7 JG r°f(r) dr. 
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5.5. Consider a two-dimensional Poisson process of particles in the 
plane with intensity parameter v. Determine the distribution F,(x) of the 
distance between a particle and its nearest neighbor. Compute the mean 
distance. 


5.6. Suppose that stars are distributed in space following a Poisson 
point process of intensity A. Fix a star alpha and let R be the distance from 
alpha to its nearest neighbor. Show that R has the probability density 
function 


x>0. 


ae | 
3 >] 


f(x) = (4A 77x’) exp| 


5.7. Consider a collection of circles in the plane whose centers are dis- 
tributed according to a spatial Poisson process with parameter NA , where 
/A| denotes the area of the set A. (In particular, the number of centers &(A) 
in the set A follows the distribution law Pr{é(A) = k} = e*[(AIA)¥/k!].) 
The radius of each circle is assumed to be a random variable independent 
of the location of the center of the circle, with density function f(r) and fi- 
nite second moment. 


(a) Show that C(r), defined to be the number of circles that cover the 
origin and have centers at a distance less than r from the origin, de- 
termines a variable-time Poisson process, where the time variable 
is now taken to be the distance r. 


Hint: Prove that an event occurring between r and r + dr (i.e., there is 
a circle that covers the origin and whose center is in the ring of radius r 
to r + dr) has probability A2ar dr J f(p) dp + o(dr), and events occur- 
ring over disjoint intervals constitute independent r.v.’s. Show that C(r) 
is a variable-time (nonhomogeneous) Poisson process with parameter 


A(r) = 27rAr | S(p) dp. 


(b) Show that the number of circles that cover the origin is a Poisson 
random variable with parameter A fo ar°f(r) dr. 
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Given a Poisson process X(t) of rate A > 0, suppose that each event has 
associated with it a random variable, possibly representing a value or a 
cost. Examples will appear shortly. The successive values Y,, Y,,... are 
assumed to be independent, independent of the Poisson process, and ran- 
dom variables sharing the common distribution function 


G(y) = Pr{Y, = y}. 


A compound Poisson process is the cumulative value process defined by 


X(t) 
Zt=> ¥ forr=0. (6.1) 
k=] 


A marked Poisson process is the sequence of pairs (W, Y,), (Ws, %), ..., 
where W,, W,, .. . are the waiting times or event times in the Poisson 
process X(t). 

Both compound Poisson and marked Poisson processes appear often as 
models of physical phenomena. 


6.1. Compound Poisson Processes 


Consider the compound Poisson process Z(t) = > Y,. If A > 0 is the rate 
for the process X(t) and x = E[Y,] and v* = Var[Y,] are the common mean 


and variance for Y,, Y,,..., then the moments of Z(t) can be determined 
from the random sums formulas of II, Section 3.2 and are 

E[Z(t)] = Apt; Var[Z(t)] = A(v? + pt. (6.2) 
Examples 


(a) Risk Theory Suppose claims arrive at an insurance company in 
accordance with a Poisson process having rate A. Let Y, be the magnitude 
of the Ath claim. Then Z(t) = %%, Y, represents the cumulative amount 
claimed up to time t. 

(b) Stock Prices Suppose that transactions in a certain stock take 
place according to a Poisson process of rate A. Let Y, denote the change in 
market price of the stock between the kth and (k — 1)st transaction. 


The random walk hypothesis asserts that Y,, Y,, . . . are independent ran- 
dom variables. The random walk hypothesis, which has a history dating 
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back to 1900, can be deduced formally from certain assumptions describ- 
ing a “perfect market.” 

Then Z(t) = 2% Y, represents the total price change up to time t. 

This stock price ‘model has been proposed as an explanation for why 
stock price changes do not follow a Gaussian (normal) distribution. 

The distribution function for the compound Poisson process Z(t) = 

x Y, can be represented explicitly after conditioning on the values of 
X(0). Recall the convolution notation 


Gy) = Pr{Y + +X Sy} 
(6.3) 
=| Gry - 9 dae) 


—x 


with 
Gy) = \ for y = 0, 
= lo fory <0. 
Then 
X(t) 
Pr{Z(t) = z} = Pr| Y,= | 
k=l 
~ X(t) rt nn Al 
=> Pr| > ¥ Six = nj re (6.4) 
n=O k= I n!} 


= a . _, 
=) ——— G""(z) (since X(t) is independent 
n=0 of Y,, Y,,...). 


Example A Shock Model Let X(t) be the number of shocks to a sys- 
tem up to time ¢ and let Y, be the damage or wear incurred by the kth 
Shock. We assume that damage is positive, i.e., that Pr{Y¥, = 0} = 1, and 
that the damage accumulates additively, so that Z(t) = 2%, Y, represents 
the total damage sustained up to time t. Suppose that the system continues 
to operate as long as this total damage is less than some critical value a 
and fails in the contrary circumstance. Let T be the time of system failure. 
Then 


{T>t} ifandonlyif {Z(t)}<a}. | (Why?) (6.5) 
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In view of (6.4) and (6.5), we have 
Ss a 


n=0 


Pr{T > t} = Ga). 


All summands are nonnegative, so we may interchange integration and 
summation to get the mean system failure time 


EIT] = | Pr{T > 1} dt 
O 


->(f% f nye = a Gi" 
O 


= Y 6a) 


n=0 


Ms 


This expression simplifies greatly in the special case in which Y,, %,... 
are each exponentially distributed according to the density g,(y) = we” 
for y = 0. Then the sum Y, + -:- + Y, has the gamma distribution 


n—| kp- peez ed k 7 gz 
G2) =1- Ss (uzye™ — > (pzye* 


k=0 k! k=n k! 


and 


io 


>» Ga) = S Ss aye” ar ne 


n=0 n=Ok=n 
_ Ss S (wa)‘e“* 
k=On=0 k! 
x k,-pa 
_ S (1 + pare 
k=0 k! 
= 1+ pa 


When Y,, Y;>, .. . are exponentially distributed, then 


1+ pa 


E{[T] = r 
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6.2. Marked Poisson Processes 


Again suppose that a random variable Y, is associated with the kth event in a 
Poisson process of rate A. We stipulate that Y,, ¥%, . . . are independent, inde- 
pendent of the Poisson process, and share the common distribution function 


G(y) = Pr{Y, = y}. 


The sequence of pairs (W,, ¥,), (Wa, Y%), . . . is called a marked Poisson 
process. 

We begin the analysis of marked Poisson processes with one of the sim- 
plest cases. For a fixed value p (0 < p < 1) suppose 


Pr{Y=l}=p, Pr =O} =q=1-p. 


Now consider separately the processes of points marked with ones and of 
points marked with zeros. In this case we can define the relevant Poisson 
processes explicitly by 

X(t) 


X() => Y, and X(t) = X() — X\(0. 
k=1 


Then nonoverlapping increments in X,(#) are independent random vari- 
ables, X,(0) = 0, and finally, Theorem 1.2 applies to assert that X,(¢) has a 
Poisson distribution with mean Apt. In summary, X,(t) is a Poisson process 
with rate Ap, and the parallel argument shows that X,(t) is a Poisson 
process with rate A(1 — p). What is even more interesting and surprising 
is that X,(t) and X,(t) are independent processes! The relevant property to 
check is that Pr{X,(t) = j and X,(t) = k} = Pr{X,(t) = Jj} X Pr{X,@ = k} 
for j,k = 0,1,.... We establish this independence by writing 


Pr{Xo(t) = j, XO) = k} = Pr{X() =f + k, XO = ky} 
= Pr{X(t) = KX(t) = j + k} Pr{X() =j + k} 


_Gt+h! Ma yanite™ 
jae PN Pr pp 

_ ae erO-PCNGC] — p)t)’ 

7 k! j! 


= Pr{X,\(t) = k} Pr{X)() = J} 
forj,kK=0,1,.... 
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Example Customers enter a store according to a Poisson process of rate 
A = 10 per hour. Independently, each customer buys something with prob- 
ability p = 0.3 and leaves without making a purchase with probability g = 
1 — p = 0.7. What is the probability that during the first hour 9 people 
enter the store and that 3 of these people make a purchase and 6 do not? 
Let X, = X,(1) be the number of customers who make a purchase dur- 
ing the first hour and X) = X,(1) be the number of people who do not. Then 
X, and X, are independent Poisson random variables having respective 
rates 0.3(10) = 3 and 0.7(10) = 7. According to the Poisson distribution, 


33e73 


Pr{X, = 3} = 


and 
Pr{X, = 3, X, = 6} = Pr{X, = 3} Pr{X) = 6} = (0.2240)(0.1490) = 0.0334. 
In our study of marked Poisson processes, let us next consider the case 
where the value random variables Y,, Y,, . . . are discrete, with possible val- 
ues 0, 1, 2,... and 
PrY,=k}=a,>0 fork =0,1,..., with > a, = 1. 
k 


In Figure 6.1, the original Poisson event times W,, W,, . . . are shown on 


Figure 6.1. A marked Poisson process. W,, W;, . . . are the event times in 
a Poisson process of rate A. The random variables Y,, Y2, . . . are 
the markings, assumed to be independent and identically distrib- 
uted, and independent of the Poisson process. 
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the bottom axis. Then a point is placed in the (¢, y) plane at (W,, Y,) for 
every n. For every integer k = 0, 1, 2, . .. one obtains a point process that 
corresponds to the times W, for which ¥, = k. The same reasoning as in 
the zero—one case applies to imply that each of these processes is Poisson, 
the rate for the kth process being Aa,, and that processes for distinct val- 
ues of k are independent. 

To state the corresponding decomposition result when the values 
Y,, Y,, . . . are continuous random variables requires a higher level of so- 
phistication, although the underlying ideas are basically the same. To set 
the stage for the formal statement, we first define what we mean by a non- 
homogeneous Poisson point process in the plane, thus extending the ho- 
mogeneous processes of the previous section. Let 8 = @(x, y) be a non- 
negative function defined on a region S in the (x, y) plane. For each subset 
A of S, let u(A) = Jf, Ox, y) dx dy be the volume under @(x, y) enclosed 
by A. A nonhomogeneous Poisson point process of intensity function 
Q(x, y) is a point process {N(A); A C S$} for which 


(i) for each subset A of S, the random variable N(A) has a Poisson dis- 
tribution with mean p(A); 

(ii) for disjoint subsets A,, ...,A,, of S, the random variables M(A,), ..., 
N(A,,) are independent. 


It is easily seen that the homogeneous Poisson point process of inten- 
sity A corresponds to the function (x, y) being constant, and 6(x, y) = A 
for all x, y. 

With this definition in hand, we state the appropriate decomposition re- 
sult for general marked Poisson processes. 


Theorem 6.1 Let (W, Y,), (W,, Y,), .. . be a marked Poisson process 
where W,, W,, .. . are the waiting times in a Poisson process of rate 
and Y,, Y,,... are independent identically distributed continuous random 
variables having probability density function g(y). Then (W, Y,), 
(W,, Y,), . . . form a two-dimensional nonhomogeneous Poisson point 
process in the (t, y) plane, where the mean number of points in a region A 
is given by 


(A) = | | Ag(y) dy dt. (6.6) 
A 
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Figure 6.2 diagrams the scene. 


W, W, W; W, W; t 


Figure 6.2 A marked Poisson process. 


Theorem 6.1 asserts that the numbers of points in disjoint intervals are 
independent random variables. For example, the waiting times correspond- 
ing to positive values Y,, Y,, .. . form a Poisson process, as do the times as- 
sociated with negative values, and these two processes are independent. 


Example Crack Failure The following model is proposed to describe 
the failure time of a sheet or volume of material subjected to a constant 
stress o. The failure time is viewed in two parts, crack initiation and crack 
propagation. 

Crack initiation occurs according to a Poisson process whose rate per 
unit time and unit volume is a constant A, > 0 depending on the stress 
level o. Crack initiation per unit time then is a Poisson process of rate 
A,|V|, where |V| is the volume of material under consideration. We let 
W,, W,,... be the times of crack initiation. 

Once begun, a crack grows at a random rate until it reaches a critical 
size, at which instant structural failure occurs. Let Y, be the time to reach 
critical size for the kth crack. The cumulative distribution function 
G,(y) = Pr{Y, = y} depends on the constant stress level o. 

We assume that crack initiations are sufficiently sparse as to make 
Y,, Y,, . . . independent random variables. That is, we do not allow two 
small cracks to join and form a larger one. 

The structural failure time Z is the smallest of W + Y,,W,+Y¥,....It 
is not necessarily the case that the first crack to appear will cause system 
failure. A later crack may grow to critical size faster. 
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Figure 6.3. Acrack failure model. 


In the (t, y) plane, the event {min{W, + Y,} > z} corresponds to no 
points falling in the triangle A = {(t, y):t + ySz,t20, y = O}, as shown 
in Figure 6.3. 

The number of points M(A) falling in the triangle A has a Poisson dis- 
tribution with mean p(A) given, according to (6.6), by 


mA) = | fA|Vids ge(u) du 
A 


-fa Any g_(u) i 


= A,|V| i G,(z — s) ds 
0 


= A,|V| | G,(v) dv. 
0) 


From this we obtain the cumulative distribution function for structural 
failure time, 


Pr{Z = z} = 1 — Pr{Z > z} = 1 — Pr{NMA) = 0} 


=|]- aol | G,(v) in| 
0) 


Observe the appearance of the so-called size effect in the model, 
wherein the structure volume |V| affects the structural failure time even at 
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constant stress level o. The parameter A, and distribution function G,(y) 
would require experimental determination. 


Example The Strength Versus Length of Filaments It was noted that 
the logarithm of mean tensile strength of brittle fibers, such as boron fila- 
ments, in general varies linearly with the logarithm of the filament length, 
but that this relation did not hold for short filaments. It was suspected that 
the breakdown in the log linear relation might be due to testing or mea- 
surement problems, rather than being an inherent property of short fila- 
ments. Evidence supporting this idea was the observation that short fila- 
ments would break in the test clamps, rather than between them as desired, 
more often than would long filaments. Some means of correcting observed 
mean strengths to account for filaments breaking in, rather than between, 
the clamps was desired. It was decided to compute the ratio between the 
actual mean strength and an ideal mean strength, obtained under the as- 
sumption that there was no stress in the clamps, as a correction factor. 

Since the molecular bonding strength is several orders of magnitude 
higher than generally observed strengths, it was felt that failure typically 
was caused by flaws. There are a number of different types of flaws, both 
internal flaws such as voids, inclusions, and weak grain boundaries, and 
external, or surface, flaws such as notches and cracks that cause stress 
concentrations. Let us suppose that flaws occur independently in a Pois- 
son manner along the length of the filament. We let Y, be the strength of 
the filament at the kth flaw and suppose Y, has the cumulative distribution 
function G(y), y > 0. We have plotted this information in Figure 6.4. The 
flaws reduce the strength. Opposing the strength is the stress in the fila- 
ment. Ideally, the stress should be constant along the filament between the 
clamp faces and zero within the clamp. In practice, the stress tapers off to 
zero over some positive length in the clamp. As a first approximation it is 
reasonable to assume that the stress decreases linearly. Let / be the length 
of the clamp and ¢ the distance between the clamps, called the gauge 
length, as illustrated in Figure 6.4 on the next page. 

The filament holds as long as the stress has not exceeded the strength 
as determined by the weakest flaw. That is, the filament will support a 
stress of y as long as no flaw points fall in the stress trapezoid of Figure 
6.4. The number of points in this trapezoid has a Poisson distribution with 
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k th flaw 


Distance along filament 


Ideal stress 


Actual 
«stress 


Stress-Strength 


SAT) 
ean 


Filament 


Clamp 
Figure 6.4 The stress versus strength of a filament under tension. Flaws re- 
ducing the strength of a filament below its theoretical maximum 
y* are distributed randomly along its length. The stress in the fila- 
ment is constant at the level y between clamps and tapers off to 


zero within the clamps. The filament fails if at any point along its 
length a flaw reduces its strength below its stress. 


mean p(B) + 2u(A). In particular, no points fall there with probability 
e7'B)+2WAl Tf we let S be the strength of the filament, then 


Pr{S > y} = en MA BY 


We compute 
/ 


(A) = | {=a dx 


0 
and 


w(B) = AtG(y). 
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Finally, the mean strength of the filament is 


oe 


E{S] = f prs >y}dy= | sof alco +2 fo(2) «| dy. 
0) 0) 


0 


For an ideal filament we use the same expression but with | = 0. 


Exercises 


6.1. Customers demanding service at a central processing facility arrive 
according to a Poisson process of intensity 6 = 8 per unit time. Indepen- 
dently, each customer is classified as high priority with probability 
a = 0.2, or low priority with probability 1 — a@ = 0.8. What is the prob- 
ability that three high priority and five low priority customers arrive dur- 
ing the first unit of time? 


6.2. Shocks occur to a system according to a Poisson process of inten- 
sity A. Each shock causes some damage to the system, and these damages 
accumulate. Let M(t) be the number of shocks up to time ¢, and let Y, be 
the damage caused by the ith shock. Then 


XQ = Yt + Yu 


is the total damage up to time ¢. Determine the mean and variance of the 
total damage at time ¢t when the individual shock damages are exponen- 
tially distributed with parameter 0. 


6.3. Let {N(t); t = 0} be a Poisson process of intensity A, and let 
Y,, ¥,,... be independent and identically distributed nonnegative random 
variables with cumulative distribution function G(y) = Pr{Y = y}. Deter- 
mine Pr{Z(t) > z\N(t) > 0}, where 


Z(t) = min{Y,, y,, se 89 Yu} 


6.4. Men and women enter a supermarket according to independent 
Poisson processes having respective rates of two and four per minute. 


(a) Starting at an arbitrary time, what is the probability that at least two 
men arrive before the first woman arrives? 
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(b) What is the probability that at least two men arrive before the third 
woman arrives? 


6.5. Alpha particles are emitted from a fixed mass of material according 
to a Poisson process of rate A. Each particle exists for a random duration 
and is then annihilated. Suppose that the successive lifetimes Y,, Y,,.. . of 
distinct particles are independent random variables having the common 
distribution function G(y) = Pr{Y, = y}. Let M(¢) be the number of parti- 
cles existing at time t. By considering the lifetimes as markings, identify 
the region in the lifetime, arrival-time space that corresponds to M(t), and 
thereby deduce the probability distribution of M(?). 


Problems 


6.1. Suppose that points are distributed over the half line [0, ©) accord- 
ing to a Poisson process of rate A. A sequence of independent and identi- 
cally distributed nonnegative random variables Y,, Y,, . . . is used to repo- 
sition the points so that a point formerly at location W, is moved to the 
location W, + Y,. Completely describe the distribution of the relocated 
points. 


6.2. Suppose that particles are distributed on the surface of a circular re- 
gion according to a spatial Poisson process of intensity A particles per unit 
area. The polar coordinates of each point are determined, and each angu- 
lar coordinate is shifted by a random amount, the amounts shifted for dis- 
tinct points being independent random variables following a fixed proba- 
bility distribution. Show that at the end of the point movement process, 
the points are still Poisson distributed over the region. 


6.3. Shocks occur to a system according to a Poisson process of inten- 
sity A. Each shock causes some damage to the system, and these damages 
accumulate. Let M(t) be the number of shocks up to time f¢, and let Y; be 
the damage caused by the ith shock. Then 


X(t) = Y, Fees + Yay 


is the total damage up to time ¢. Suppose that the system continues to op- 
erate as long as the total damage is strictly less than some critical value a, 
and fails in the contrary circumstance. Determine the mean time to system 
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failure when the individual damages Y, have a geometric distribution with 
Pr{Y = k} = p(l — py’, k =0,1,.... 


6.4. Let {X(; t = O} and {Y(t); t = 0} be independent Poisson 
processes with respective parameters A and w. For a fixed integer a, let 
T, = min{t = 0; Y(t) = a} be the random time that the Y process first 
reaches the value a. Determine Pr{X(T,) = k} fork =0,1,.... 


Hint: First consider € = X(T,) in the case in which a = 1. Then & has 
a geometric distribution. Then argue that X(7,) for general a has the 
same distribution as the sum of a independent é’s and hence has a neg- 
ative binomial distribution. 


6.5. Let {X(; t = 0} and {Y(); t = 0} be independent Poisson 
processes with respective parameters A and yw. Let T = min{t = QO; 


Y(t) = 1} be the random time of the first event in the Y process. Determine 
Pr{X(7/2) = k} fork =0,1,.... 


6.6 Let W, W,,... be the event times in a Poisson process {X(t); t = 0} 
of rate A. A new point process is created as follows: Each point W, is re- 
placed by two new points located at W, + X, and W, + Y,, where X,, Y,, 
X,, Y;,... are independent and identically distributed nonnegative random 
variables, independent of the Poisson process. Describe the distribution of 
the resulting point process. 


6.7. Let {N(t); t 2 0} be a Poisson process of intensity A, and let 
Y,, ¥,,... be independent and identically distributed nonnegative random 
variables with cumulative distribution function 


G(y) = y* forO<y<l. 
Determine Pr{Z(t) > z\N(t) > 0}, where 
Z(t) = min{Y, Y,,..., Yan}. 
Describe the behavior for large t. 
6.8. Let {N(t); t = 0} be a nonhomogeneous Poisson process of inten- 


sity A(t), t > O, and let Y, Y,, . . . be independent and identically distrib- 
uted nonnegative random variables with cumulative distribution function 


G(y) = y* forO<y<l. 
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Suppose that the intensity process averages out in the sense that 


1 t 
lim — | A(u) du = @. 
[9% a 


Let 
Z(t) = min{Y,, %,.-.. Yu}. 
Determine 
lim Pr{t'"Z(t) > z}. 
6.9. LetW, W,... be the event times in a Poisson process of rate A, 


and let N(t) = N((O, t]) be the number of points in the interval (0, f]. 
Evaluate 


N(t) 
E| > cw, | 
k=] 
Note: %2., (W,) = 0. 


6.10. A Bidding Model Let U,, U,,... be independent random vari- 
ables, each uniformly distributed over the interval (0, 1]. These random 
variables represent successive bids on an asset that you are trying to sell, 
and that you must sell by time ¢ = 1, when the asset becomes worthless. 
As a strategy, you adopt a secret number 6, and you will accept the first 
offer that is greater than 6. For example, you accept the second offer if 
U, = 0 while U, > 6. Suppose that the offers arrive according to a unit 
rate Poisson process (A = 1). 


(a) What is the probability that you sell the asset by time t = 1? 

(b) What is the value for 6 that maximizes your expected return? (You get 
nothing if you don’t sell the asset by time t = 1.) 

(c) To improve your return, you adopt a new strategy, which is to accept 
an offer at time t¢ if it exceeds 6(t) = (1 — A/(3 — £). What are your 
new chances of selling the asset, and what is your new expected 
return? 


Chapter VI 
Continuous Time Markov Chains 


1. Pure Birth Processes 


In this chapter we present several important examples of continuous time, 
discrete state, Markov processes. Specifically, we deal here with a family 
of random variables {X(2); 0 = t < ©} where the possible values of X(t) 
are the nonnegative integers. We shall restrict attention to the case where 
{X(t)} 1s a Markov process with stationary transition probabilities. Thus, 
the transition probability function for t > 0, 


P(t) = Pr{X(t+ u) =j|/Xu) =i}, if =0,1,2,..., 


is independent of u = 0. 

It is usually more natural in investigating particular stochastic models 
based on physical phenomena to prescribe the so-called infinitesimal 
probabilities relating to the process and then derive from them an explicit 
expression for the transition probability function. For the case at hand, we 
will postulate the form of P,(h) for h small, and, using the Markov prop- 
erty, we will derive a system of differential equations satisfied by P,(t) for 
all t > 0. The solution of these equations under suitable boundary condi- 
tions gives P(t). 


By way of introduction to the general pure birth process, we review 
briefly the axioms characterizing the Poisson process. 
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1.1. Postulates for the Poisson Process 


The Poisson process is the prototypical pure birth process. Let us point out 
the relevant properties. The Poisson process is a Markov process on the 
nonnegative integers for which 


(i) Pr{X(t + h) — X(t) = 11X(t) = x} = Ah + O(h) ash lO 
(x = 0,1, 2,...). 
(ii) Pr{X(t + h) — X(t) = O|X() = x} = 1 — Ah t+ O(h) ash O. 
(iii) X(0) = 0. 


The precise interpretation of (1) is the relationship 


Pr{X(t + h) — X(t) = 1|X(t) = x} 
lim pS 
h-0+ 
The o(h) symbol represents a negligible remainder term in the sense that 
if we divide the term by h, then the resulting value tends to zero as h tends 
to zero. Notice that the right side of (i) is independent of x. 


These properties are easily verified by direct computation, since the ex- 
plicit formulas for all the relevant properties are available. Problem 1.13 
calls for showing that these properties, in fact, define the Poisson process. 


1.2. Pure Birth Process 


A natural generalization of the Poisson process is to permit the chance of 
an event occurring at a given instant of time to depend upon the number 
of events that have already occurred. An example of this phenomenon is 
the reproduction of living organisms (and hence the name of the process), 
in which under certain conditions—sufficient food, no mortality, no mi- 
gration, for example—the infinitesimal probability of a birth at a given in- 
stant is proportional (directly) to the population size at that time. This ex- 
ample is known as the Yule process and will be considered in detail later. 

Consider a sequence of positive numbers, {A,}. We define a pure birth 
process as a Markov process satisfying the following postulates: 


(1) Pr{X(t + h) — X(t) = 1/X(t) = k} = A.A + 0,,(K) (h > 04). 
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(2) Pr{X(t + h) — X(t) = O|XH =k} =1—-AAto,(h) (11) 
(3) Pr{X(t + h) — X(t) < 0}X(t) = k} = 0 (k= 0). 


As a matter of convenience we often add the postulate 
(4) X(0) = 0. 


With this postulate X(t) does not denote the population size but, rather, the 
number of births in the time interval (0, f}. 

Note that the left sides of Postulates (1) and (2) are just FP, ,.,(h) and 
P. (h), respectively (owing to stationarity), so that 0, ,(h) and 0, ,(h) do not 
depend upon f. 

We define P,(t) = Pr{X(t) = n}, assuming X(0) = 0. 

By analyzing the possibilities at time ¢ just prior to time ¢ + h (h small), 
we will derive a system of differential equations satisfied by P(t) for 
t = 0, namely 


Pi(t) = —ApPi(), 
P(O=—-A,PO+A Pi forn2 1, 


(1.2) 


with initial conditions 
P,(O) = 1, P(0) = 0, n> 0. 


Indeed, if h > 0, n = 1, then by invoking the law of total probability, the 
Markov property, and Postulate (3) we obtain 


P(t +h) = >. Pt) Pr{X(t + h) = n|X() = k} 


k=0 


= S P(t) Pr{X(t + h) — X() =n — kX(t) = k} 
O 


k= 


= \ P(t) Pr{X(t + h) — X(t) =n — Ki X(t) = k}. 
=() 


k 
Now fork = 0, 1,...,n— 2 we have 


Pr{X(t + h) — X(t) =n — kK\X(D) = k} 
<= Pr{X(t + h) — X(t) = 2)X() = k} 
= 0,,(h) + 02,(h), 
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or 
Pr{X(t + h) — X(t) =n — KX() =k} =0,,,(h), k=0,...,n —2. 
Thus 

P(t +h) = P(N[1 — Ah + 0,,(h)) + BOA, hk + 01,0] 


n-2 


+ >) P(t)03,,,Ah)k, 
k=0 
“or 
P(t + h) — P,®) (1.3) 
= P(O[—A,h + 0,,(A)) + POA, 2 + 01,,-.)] + 0,(h), 


where, clearly, lim,), 0,(4)/h = O uniformly in t = Q, since o,(h) is 
bounded by the finite sum 27-5 0;,, ,(h), which does not depend on t. 

Dividing by h and passing to the limit h | 0, we validate the relations 
(1.2), where on the left side we should, to be precise, write the derivative 
from the right. With a little more care, however, we can derive the same 
relation involving the derivative from the left. In fact, from (1.3) we see 
at once that the P(r) are continuous functions of t. Replacing t by t — h in 
(1.3), dividing by h, and passing to the limit h L 0, we find that each P(t) 
has a left derivative that also satisfies equation (1.2). 

The first equation of (1.2) can be solved immediately and yields 


P(t) = exp{—Ajt} fort > 0. (1.4) 


Define S, as the time between the kth and the (k + 1)st birth, so that 


n—-1 n 
P(t) = Pr| >) S.st<)> s} 
i=0 i=0 


The random variables S, are called the “sojourn times” between births, 
and 
k-1 
W, = >. S, = the time at which the Ath birth occurs. 
i=0 


We have already seen that P(t) = exp{—A,t}. Therefore, 


Pr{S, = t} = 1 — Pr{x(t) = 0} = 1 — exp{—A,¢}; 
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j.e., Sp has an exponential distribution with parameter A). It may be deduced 

from Postulates (1) through (4) that S,, k > 0, also has an exponential dis- 

tribution with parameter A, and that the S;’s are mutually independent. 
This description characterizes the pure birth process in terms of its so- 

jour times, in contrast to the infinitesimal description corresponding to 

1.1). 
to solve the differential equations of (1.2) recursively, introduce Q,(t) 
= e\!P(t) forn = 0, 1,.... Then 


Qi(t) = A,e""P(t) + e'P)(t) 
= e'[A, P(t) + Pd] 
= eh, Pd) (using (1.2)]. 


Integrating both sides of these equations and using the boundary condition 
O,(0) = 0 forn = 1 gives 


t 
0,(0) = | eA, .P.-u(x) de, 
0 
or 


t 
Pi) = Ave fewP (xd, n=1,2,.... (15) 
0 
It is now clear that all P,(t) = O, but there is still a possibility that 


>, Plt) <1. 

n=0 
To secure the validity of the process, i.e., to assure that 2;-) P(t) = 1 for 
all t, we must restrict the A, according to the following: 


>, P(t) = 1 if and only if > J = &, (1.6) 
n=0 n=0 “un 
The intuitive argument for this result is as follows: The time S, between 
consecutive births is exponentially distributed with a corresponding para- 
meter A,. Therefore, the quantity >, 1/A,, equals the expected time before 
the population becomes infinite. By comparison, | — %,-. PZ) is the 
probability that X(t) = ©. 
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If &,, A; | < ©, then the expected time for the population to become in- 
finite is finite. It is then plausible that for all ¢ > 0 the probability that 
X(t) = © 1s positive. 

When no two of the birth parameters Ay, A,, .. . are equal, the integral 
equation (1.5) may be solved to give the explicit formula 


P(t) = er, 
p= a[oews ten, 
and 
P(t) = Pr{X() = n|X(0) = 0} 8) 
= A,°°: A,-,[By,e°°" + °° + B,eo*"] forn > 1, 
where 
Bo, = — 
(Ay = Ag) 2 An = Ao) 
B,. = (1.9) 
"(Ag = Ag) tt (Agar Ag Ager 7 Ag) ot (An Ag) 
forO<k<n, 
and 


l 
"mn (Ay _ X,) ee (A,- ~ X,) 


Because A, # A, when j # & by assumption, the denominator in (1.9) does 
not vanish, and B, ,, 1s well-defined. 

We will verify that P,(t), as given by (1.7), satisfies (1.5). Equation 
(1.4) gives P(t) = e-*'. We next substitute this in (1.5) when n = 1, 
thereby obtaining 


B 


f 
P(t) = Age’ | ere de 
0 


=_ Aye "(Ag — A,)'f — eW Ar ANT 


I I 
_— ero! + “*') 
{ hy — A, 


in agreement with (1.7). 
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The induction proof for general n involves tedious and difficult algebra. 
The case n = 2 is suggested as a problem. 


1.3. The Yule Process 


The Yule process arises in physics and biology and describes the growth 
of a population in which each member has a probability Bh + o(h) of giv- 
ing birth to a new member during an interval of time of length h (B > Q). 
Assuming independence and no interaction among members of the popu- 
lation, the binomial theorem gives 

n 

1 


Pr(X(t + h) — XW) = 1|X@) = n} = ( \1eh + o(h)\[1 — Bh + o(h)}""' 
= nBh + o,(h); 


i.e., for the Yule process the infinitesimal parameters are A, = nf. In 
words, the total population birth rate is directly proportional to the popu- 
lation size, the proportionality constant being the individual birth rate B. 
As such, the Yule process forms a stochastic analogue of the determinis- 
tic population growth model represented by the differential equation 
dy/dt = ay. In the deterministic model, the rate dy/dt of population growth 
is directly proportional to population size y. In the stochastic model, the 
infinitesimal deterministic increase dy is replaced by the probability of a 
unit increase during the infinitesimal time interval dt. Similar connections 
between deterministic rates and birth (and death) parameters arise fre- 
quently in stochastic modeling. Examples abound in this chapter. 
The system of equations (1.2) in the case that X(0) = 1 becomes 


Pi) = —BlInh) -(@- DRO), n= 1,2,..., 
under the initial conditions 
P\(O) = 1, PO) = 0, n=2,3,.... 
Its solution is 
Pj =e *1 -— ey, n=l, (1.10) 


as may be verified directly. We recognize (1.10) as the geometric distrib- 
ution I, (3.5) with p = e~*. 
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The general solution analogous to (1.8) but for pure birth processes 
starting from X(O) = 1 is 


PQ =A A Ber to + Be), n> le C.D 


When A,, = Bn, we will show that (1.11) reduces to the solution given in 
(1.10) for a Yule process with parameter 8. Then 


I 
=——__+ 
| BIMDQ) + @ = 1) 
_ l 
~ Bn DP 
I 
l 
© BHM =DM)Q) ++ = 2) 
_ —I 
~ Brn = 20" 
and 
7 l 


_ (= 1)! 
Bk — In — ky 
Thus, according to (1.11), 
P(t) = B''(n — 1I)B,,e°"% + +++ + Bye") 


aa 


— . (n _ 1)! —_— k-1,-kBr 
“2.&-pin-p” ° 
nal (n — 1)! | 
= p Bi —_—_ OB 
"£4 jn-1-pie* 


= @ h(] — eB! [see I, (6.17)], 
which establishes (1.10). 
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Exercises 


1.1. A pure birth process starting from X(0) = 0 has birth parameters 
Ay = 1, A, = 3, Ap = 2, A; = 5. Determine P(t) for n = 0, 1, 2, 3. 


1.2. <A pure birth process starting from X(0) = 0 has birth parameters 
Ap = 1, A, = 3, A, = 2, A; = 5. Let W, be the random time that it takes the 
process to reach state 3. 


(a) Write W, as a sum of sojourn times and thereby deduce that the 
mean time is E[W,] = 4. 

(b) Determine the mean of W, + W, + W.. 

(c) What is the variance of W,? 


1.3. A population of organisms evolves as follows. Each organism ex- 
ists, independent of the other organisms, for an exponentially distributed 
length of time with parameter 6, and then splits into two new organisms, 
each of which exists, independent of the other organisms, for an expo- 
nentially distributed length of time with parameter 6, and then splits into 
two new organisms, and so on. Let X(t) denote the number of organisms 
existing at time ¢. Show that X(t) is a Yule process. 


1.4. Consider an experiment in which a certain event will occur with 
probability ah and will not occur with probability 1 — ah, where @ is a 
fixed positive parameter and h is a small (h < 1/a) positive variable. Sup- 
pose that n independent trials of the experiment are carried out, and the 
total number of times that the event occurs is noted. Show that 


(a) The probability that the event never occurs during the n trials is 
1 — nah + oth); 

(b) The probability that the event occurs exactly once is nah + o(h); 

(c) The probability that the event occurs twice or more is o(h). 


Hint: Use the binomial expansion 


(n — 1) 
2 


(1 — ah)" = 1- nah + r (ah) — 


1.5. Using equation (1.10), calculate the mean and variance for the Yule 
process where X(0) = 1. 
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1.6. Operations 1, 2, and 3 are to be performed in succession on a major 
piece of equipment. Operation k takes a random duration S, that is expo- 
nentially distributed with parameter A, for k = 1, 2, 3, and all operation 
times are independent. Let X(t) denote the operation being performed at 
time ¢, with time t = O marking the start of the first operation. Suppose 
that A, = 5, A, = 3, and A, = 13. Determine 


(a) P(t) = Pr{X(t) = 1}. 
(b) P(t) = Pr{X(t) = 2}. 
(c) P(t) = Pr{X(t) = 3}. 


Problems 


1.1. Let X(t) be a Yule process that is observed at a random time U, 
where U is uniformly distributed over [0, 1). Show that Pr{X(U) = k} = 
p*'/(Bk) fork = 1,2,..., withp = 1—e%. 


Hint: Integrate (1.10) over t between 0 and 1. 


1.2. A Yule process with immigration has birth parameters A, = a + kB 
fork = 0,1, 2,.... Here a@ represents the rate of immigration into the 
population, and £8 represents the individual birth rate. Supposing that 
X(0) = 0, determine P(t) forn = 0, 1,2,.... 


1.3. Consider a population comprising a fixed number N of individuals. 
Suppose that at time ¢ = O there is exactly one infected individual and 
N — 1 susceptible individuals in the population. Once infected, an indi- 
vidual remains in that state forever. In any short time interval of length h, 
any given infected person will transmit the disease to any given suscepti- 
ble person with probability ah + o(h). (The parameter a is the individual 
infection rate.) Let X(t) denote the number of infected individuals in the 
population at time ¢ = 0. Then X(‘) is a pure birth process on the states 
0,1,...,N. Specify the birth parameters. 


1.4. Anew product (a “Home Helicopter” to solve the commuting prob- 
lem) is being introduced. The sales are expected to be determined by both 
media (newspaper and television) advertising and word-of-1nouth adver- 
tising, wherein satisfied customers tell others about the product. Assume 
that media advertising creates new customers according to a Poisson 
process of rate a = 1 customer per month. For the word-of-mouth adver- 
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tising, assume that each purchaser of a Home Helicopter wil! generate 
sales to new customers at a rate of 6 = 2 customers per month. Let X(t) 
be the total number of Home Helicopter customers up to time t. 


(a) Model X(#) as a pure birth process by specifying the birth parame- 
ters A,, fork =0,1,.... 

(b) What is the probability that exactly two Home Helicopters are sold 
during the first month? 


1.5. Let W, be the time to the Ath birth in a pure birth process starting 
from X(0) = 0. Establish the equivalence 


Pr{(W, > t, W, > t+ s} = Po(t)[P(s) + P(s)]. 


From this relation together with equation (1.7), determine the joint den- 
sity for W, and W,, and then the joint density of S, = W, and S, = 
W, — W.. 


1.6. <A fatigue model for the growth of a crack in a discrete lattice 
proposes that the size of the crack evolves as a pure birth process with 
parameters 


A, = (1 + k) fork =1,2,.... 


The theory behind the model postulates that the growth rate of the crack 
is proportional to some power of the stress concentration at its ends, and 
that this stress concentration is itself proportional to some power of 
1 + k, where k is the crack length. Use the sojourn time description to de- 
duce that the mean time for the crack to grow to infinite length is finite 
when p > 1, and that therefore the failure time of the system is a well- 
defined and finite-valued random variable. 


1.7. Let Ay, A,, and A, be the parameters of the independent exponen- 
tially distributed random variables S,, S,, and S,. Assume that no two of 
the parameters are equal. 


(a) Verify that 
Pr{S, > t} = ee", 


— A, ~Aul Ao -Ajl 
Priso + 5) > 1} = > ne ta ec", 
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and evaluate in similar terms 
Pr{S, + S, + $8, > t}. 
(b) Verify equation (1.8) in the case that n = 2 by evaluating 
P(t) = Pr{X() = 2} = Pr{S, + S, + S, >t} — Pr{S, + S, > t}. 


1.8. Let N(x) be a pure birth process for which 

Pr{an event happens in (f, t + h)|N(t) is odd} = ah + o(h), 

Pr{an event happens in (f, t + h)|N(t) is even} = Bh + o(h), 
where o(hV/h —- 0 as h lL 0. Take M(O) = O. Find the following 
probabilities: 

P,(t) = Pr{N(t) 1s even}; P(t) = Pr{N(t) 1s odd}. 
Hint: Derive the differential equations 
Pi(t) = aP(t) — BP(t) and Pi(t) = —a@P(t) + BPP), 
and solve them by using P,(f) + P(t) = 1. 


1.9. Under the conditions of Problem 1.8, determine E[N(t)]. 


1.10. Consider a pure birth process on the states 0, 1,..., N for which 
A, = (N — k)A for k = 0,1, ..., N. Suppose that X(0) = 0. Determine 
Pt) = Pr{X() = n} for n = 0, 1, and 2. 


1.11. Beginning with P(t) = e-*o' and using equation (1.5), calculate 
P,(t), P,(t), and P,(t) and verify that these probabilities conform with equa- 
tion (1.7), assuming distinct birth parameters. 


1.12 Verify that P,(t), as given by (1.8), satisfies (1.5) by following the 
calculations in the text that showed that P,(t) satisfies (1.5). 


1.13. Using (1.5), derive P(t) when all birth parameters are the same 


constant A and show that 
At nara 
P(t) = een =01,.... 
n! 


Thus, the postulates of Section 1.1 serve to define the Poisson processes. 
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2. Pure Death Processes 


Complementing the increasing pure birth process is the decreasing pure 
death process. It moves successively through states N, N — 1,..., 2, 1 
and ultimately is absorbed in state 0 (extinction). The process is specified 
by the death parameters 1, > 0 fork = 1,2,...,N, where the sojourn 
time in state k is exponentially distributed with parameter p,, all sojourn 
times being independent. A typical sample path is depicted in Figure 2.1. 


W, W, Ww -1 Wy t 


o 


Figure 2.1 A typical sample path of a pure death process, showing the 
sojourn times S,,..., $, and the waiting times W,, W,,..., W,. 


Alternatively, we have the infinitesimal description of a pure death 
process as a Markov process X(t) whose state space is 0, 1,..., N and for 
which 


(i) Pr{X(t + h) =k — 1X) =k} = wht od), k= 1,...,N; 
(ii) Pr{X(t + h) = X(t) = k} = 1 — ph t o(h), k= 1,...,N5 (2.1) 
(iii) Pr{X(t + h) > kX(t) = k} = 0,k =0,1,...,N. 


The parameter j1, is the “death rate” operating or in effect while the 
process sojourns in state k. It is acommon and useful convention to assign 
by = 0. 
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When the death parameters ju,, 2,..., My are distinct, that is, bh; F py 
if 7 # k, then we have the explicit transition probabilities 
PAt) = es; 
and forn < N, 
P(t) = Pr{X(t) = nlX(0) = N} o0 
= Mast Pn+2 se" bylA,,,e + se + Ay,,e *"], 
where 
] 
A,., — Og 
(My — My 0 (eer T Be) Mee T Maid 0 Cen Be) 


2.1. The Linear Death Process 


As an example, consider a pure death process in which the death rates are 
proportional to population size. This process, which we will call the lin- 
ear death process, complements the Yule, or linear birth, process. The pa- 
rameters are 4, = ka, where a is the individual death rate in the popula- 
tion. Then 


] 
A a ee 
mn (phy ~ bh) (py | _ L,,) ms (Mya ~ L,,) 
] 
a" (N = n)(N — n — 1) +++ (2) 
] 
A,+1.,:= 2700. 
(My ~ Mae °°? yer 7 Maen Bad 
7 ] 
aN — n — 1) ++ (A 1)’ 
A. = —Ss 
= (Ly —~ Ma oo" CMgar — Ma) (Mee Tm Bad 8 Cb Be) 
] 
© OWN = By (IH D(—2) (= 
] 
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Then y 
PA) = Moysiblnsa “** My >, AgeM 
N _ eo kar 
No" (=1le! (2.3) 


— anal 
= e 


n! jzo (N-—n— jy! 


N! - nN 
~ AN — nl (l-e ) ; n=0,...,N. 


Let T be the time of population extinction. Formally, T = min{t = 0; 
X(t) = 0}. Then T S tif and only if X(t) = 0, which leads to the cumula- 
tive distribution function of T via 


F(t) = Pr{T = t} = Pr{X(t) = 0} 


2.4 
= P(t) = (1 —- e*")*, t= 0. C4) 


The linear death process can be viewed in yet another way, a way that 
again confirms the intimate connection between the exponential distribu- 
tion and a continuous time parameter Markov chain. Consider a popula- 
tion consisting of N individuals, each of whose lifetimes is an independent 
exponentially distributed random variable with parameter a. Let X(t) be 
the number of survivors in this population at time t. Then X(¢) is the lin- 
ear pure death process whose parameters are uw, = ka fork = 0,1,..., 
N. To help understand this connection, let &, &, ..., & denote the times 
of death of the individuals labeled 1, 2,..., N, respectively. Figure 2.2, 
on the next page, shows the relation between the individual lifetimes 
€, &,..., €& and the death process X(t). 

The sojourn time in state N, denoted by S,, equals the time of the earli- 


est death, or S, = min{é,,..., &}. Since the lifetimes are independent 
and have the same exponential distribution, 
Pr{S, > t} = Pr{min{é,,..., &} >t} 


=Prf{é>t...,&>12} 


= [Pr{é, > r}]* 


— e Nar 
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N= 6 En 
5 
z 
S 4 
= 3 
> 
la 2 
S 
Time of 
X (t) 
N = 


W, WwW, W,; W, W; Wy t 


Figure 2.2 The linear death process. As depicted here, the third individual is 
the first to die, the first individual is the second to die, etc. 


That is, S, has an exponential distribution with parameter Na. Similar rea- 
soning applies when there are k members alive in the population. The 
memoryless property of the exponential distribution implies that the re- 
maining lifetime of each of these k individuals is exponentially distributed 
with parameter a. Then the sojourn time S, is the minimum of these k re- 
maining lifetimes and hence is exponentially distributed with parameter 
ka. To give one more approach in terms of transition rates, each individ- 
ual in the population has a constant death rate of a in the sense that 


Prit<& <t+he<gy = eS seth 


Pr{t < &} 
en — earth) 
= eu 
— l _ ea 


ah+o(h)  ashJo. 
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If each of k individuals alive in the population at time ¢ has a constant 
death rate of a, then the total population death rate should be ka, directly 
proportional to the population size. This shortcut approach to specifying 
appropriate death parameters is a powerful and often-used tool of sto- 
chastic modeling. The next example furnishes another illustration of its 


use. 


2.2. Cable Failure Under Static Fatigue 


A cable composed of parallel fibers under tension is being designed to 
support a high-altitude weather balloon. With a design load of 1000 kg 
and a design lifetime of 100 years, how many fibers should be used in the 
cable? 

The low-weight, high-strength fibers to be used are subject to static 
fatigue, or eventual failure when subjected to a constant load. The higher 
the constant load, the shorter the life, and experiments have established a 
linear plot on log—log axes between average failure time and load that is 
shown in Figure 2.3. 

The relation between mean life jz; and load / that is illustrated in Fig- 
ure 2.3 takes the analytic form 


log 7 = 2 — 40 logyol. 


log (time) 


log (100 yrs) 


log (1 kg) log (load) 


Figure 2.3 A linear relation between log mean failure time and log load. 
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Were the cable to be designed on the basis of average life, to achieve 
the 100 year ra target each fiber should carry 1 kg. Since the total load 
is 1000 kg, N = ‘“* = 1000 fibers should be used in the cable. 

One might suppose that this large number (N = 1000) of fibers would 
justify designing the cable based on average fiber properties. We shall see 
that such reasoning is dangerously wrong. 

Let us suppose, however, as is the case with many modern high-perfor- 
mance structural materials, that there is a large amount of random scatter 
of individual fiber lifetimes about the mean. How does this randomness 
affect the design problem? 

Some assumption must be made concerning the probability distribution 
governing individual fiber lifetimes. In practice, it is extremely difficult to 
gather sufficient data to determine this distribution with any degree of cer- 
tainty. Most data do show, however, a significant degree of skewness, or 
asymmetry. Because it qualitatively matches observed data and because it 
leads to a pure death process model that is accessible to exhaustive analy- 
sis, we will assume that the probability distribution for the failure time T 
of a single fiber subjected to the time-varying tensile load /(¢) is given by 


Pri{Tst}=1- eof-[ UC is. t= 0. 
0 


This distribution corresponds to a failure rate, or hazard rate, of 
r(t) = K[U(t)] wherein a single fiber, having not failed prior to time t and 
carrying the load /(t), will fail during the interval (t, ¢ + At] with 
probability 


Prit< T<t+ AT >t} = K[U(d] At + o(Ad. 


The function K[/], called the breakdown rule, expresses how changes in 
load affect the failure probability. We are concerned with the power law 
breakdown rule in which K[/] = 1°/A for some positive constants A and B. 
Assuming power law breakdown, under a constant load I(t) = I, the single 
fiber failure time is exponentially distributed with mean w,; = E[T|I] = 
1/K[/] = Al-®. A plot of mean failure time versus load is linear on log—log 
axes, matching the observed properties of our fiber type. For the design 
problem we have 8 = 40 and A = 100. 


2. Pure Death Processes 351 


Now place N of these fibers in parallel and subject the resulting bundle 
or cable to a total load, constant in time, of NL, where L is the nominal 
load per fiber. What is the probability distribution of the time at which the 
cable fails? Since the fibers are in parallel, this system failure time equals 
the failure time of the last fiber. 

Under the stated assumptions governing single-fiber behavior, X(t), the 
number of unfailed fibers in the cable at time f, evolves as a pure death 
process with parameters uw, = kKK[NL/k] for k = 1, 2,..., N. Given 
X(t) = k surviving fibers at time ¢ and assuming that the total bundle load 
NL is shared equally among them, then each carries load NL/k and has a 
corresponding failure rate of K[NL/k]. As there are k such survivors in the 
bundle, the bundle, or system, failure rate is uw, = kKK[NL/k] as claimed. 

It was mentioned earlier that the system failure time was W,, the wait- 
ing time to the Nth fiber failure. Then Pr{W, = t} = Pr{X(t) = 0} = Pi), 
where P,(t) is given explicitly by (2.2) in terms of w,,..., wy. Alterna- 
tively, we may bring to bear the sojourn time description of the pure death 
process and, following Figure 2.1, write 


Wy = Sy + Syey Foes + Si, 


where Sy, Sy_;,..., 5, are independent exponentially distributed random 
variables and S, has parameter w, = kKK[NL/k] = k(NL/k)*/A. The mean 
system failure time is readily computed to be 


E(Wy] = E[Sy] + -:- + E[S,] 
XM 1k 
= AL? > (5) 


ow (kV 

=r SOF (8) 

py N N 

The sum in the expression for E[W,] seems formidable at first glance, but 
a very close approximation is readily available when N is large. Figure 2.4 


compares the sum to an integral. 
From Figure 2.4 on the next page we see that 


Si) (ile fered 


k= 


(2.5) 
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F(x) = xB =! 
aon 

|= 
——? 


1 2 3 ] N-11 x 
°NaW OW W 


Figure 2.4 The sum 3;., (k/N)*'(1/N) is a Riemann approximation 
to fi x8"! dx = I/B. 


Indeed, we readily obtain 
I N B-1 
a= {eas > (5) (—) = | x8! dx 


= (alll +) ~ (ne) | 


When N = 1000 and B = 40, the numerical bounds are 


(s)<S (EY (b= (hlam 


which shows that the integral determines the sum to within about 4 
percent. 


Substituting 1/8 for the sum in (2.5) gives the average cable life 


A 
E(Wy] = BL’ 
to be compared with the average fiber life of 
A 
Mer 
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That is, a cable lasts only about 1/8 as long as an average fiber under an 
equivalent load. With A = 100, B = 40, and N = 1000, the designed cable 
would last, on the average, 100/[40(1)*] = 2.5 years, far short of the de- 
sired life of 100 years. The cure is to increase the number of fibers in the 
cable, thereby decreasing the per fiber load. Increasing the number 
of fibers from N to N’ decreases the nominal load per fiber from 
Lto L’ = NLIN'. To achieve parity in fiber-cable lifetimes we equate 


A A 


L? B(NLIN'Y” 
or 
N’ = Ng". 


For the given data, this calls for N’ = 1000(40)'° = 1097 fibers. That is, 
the design lifetime can be restored by increasing the number of fibers in 
the cable by about 10 percent. 


Exercises 


2.1. A pure death process starting from X(0) = 3 has death parameters 
My = 0, wh, = 3, fo = 2, w, = 5. Determine P(t) for n = 0, 1, 2, 3. 


2.2. A pure death process starting from X(0) = 3 has death parameters 
My = 0, w, = 3, M2 = 2, ww, = 5. Let W, be the random time that it takes 
the process to reach state 0. 


(a) Write W, as a sum of sojourn times and thereby deduce that the 
mean time is E[W,] = %. 

(b) Determine the mean of W, + W, + W.. 

(c) What is the variance of W,? 


2.3. Give the transition probabilities for the pure death process de- 
scribed by X(0) = 3, w, = 1, mw, = 2, and p, = 3. 


2.4. Consider the linear death process (Section 2.1) in which 
X(0) = N = 5 and a = 2. Determine Pr{X(t) = 2}. 


Hint: Use equation (2.3). 
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Problems 


2.1. Let X(¢) be a pure death process starting from X(0) = N. Assume 
that the death parameters are j2,, My, ..., ty. Let T be an independent ex- 
ponentially distributed random variable with parameter 6. Show that 


N Lh 
Pr{X(T) = 0} = —, 
r{X(T) = 0} la +e 
2.2. Let X(t) be a pure death process with constant death rates uw, = @ 
fork = 1,2,..., N. If X(O) = N, determine P(t) = Pr{X(t) = n} for 
n=0,1,...,N. 
2.3. A pure death process X(t) with parameters ,, ,,... starts at 


X(0) = N and evolves until it reaches the absorbing state 0. Determine the 
mean area under the X(f) trajectory. 


Hint: This is E[W, + W,+--::-+ W]. 


2.4. Achemical solution contains N molecules of type A and M mole- 
cules of type B. An irreversible reaction occurs between type A and B 
molecules in which they bond to form a new compound AB. Suppose that 
in any small time interval of length h, any particular unbonded A molecule 
will react with any particular unbonded B molecule with probability 
6h + o(h), where @ is a reaction rate. Let X(t) denote the number of un- 
bonded A molecules at time ¢. 

(a) Model X(t) as a pure death process by specifying the parameters. 

(b) Assume that NV < M, so that eventually all of the A molecules be- 

come bonded. Determine the mean time until this happens. 


2.5. Consider a cable composed of fibers following the breakdown rule 
K{l] = sinh(/) = 3(e' — e7') for 1 = 0. Show that the mean cable life is 
given by 


E[Wy] => k sinh(NL/k)}~ = | sinh (—— = (-) 


k=] 


l 
~ | {x sinh(L/x)}~' dx 
0 
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2.6. Let T be the time to extinction in the linear death process with pa- 
rameters X(0) = N and a (see Section 2.1). 


(a) Using the sojourn time viewpoint, show that 


BIT] = —|— + — bets 

= ln WH iJ 

(b) Verify the result of (a) by using equation (2.4) in 
E[(T] =| Pr{T > t} dt =| [1 — F,(0)] dt. 
0 


0 
Hint: Lety=1—-—e™. 


3. Birth and Death Processes 


An obvious generalization of the pure birth and pure death processes dis- 
cussed in Sections 1 and 2 is to permit X(¢) both to increase and to de- 
crease. Thus if at time ¢ the process is in state n, it may, after a random so- 
journ time, move to either of the neighboring states n + 1 orn — 1. The 
resulting birth and death process can then be regarded as the continuous- 
time analogue of a random walk (III, Section 5.3). 

Birth and death processes form a powerful tool in the kit of the sto- 
chastic modeler. The richness of the birth and death parameters facilitates 
modeling a variety of phenomena. At the same time, standard methods of 
analysis are available for determining numerous important quantities such 
as stationary distributions and mean first passage times. This section and 
later sections contain several examples of birth and death processes and 
illustrate how they are used to draw conclusions about phenomena in a va- 
riety of disciplines. 


3.1. Postulates 


As in the case of the pure birth processes, we assume that X(t) is a Markov 
process on the states 0, 1, 2, ... and that its transition probabilities P,(7) 
are stationary; 1.e., 


P,(t) = Pr{X(t + s) =j|X(s) =i} forall s = 0. 
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In addition, we assume that the P,(z) satisfy 


(1) Pisi(h) = AA + off) ash L 0,i = 0; 
(2) P,_\(h) = ph + o(h) ash l0,i= 1; 
(3) P,(h) = 1 — (A, + wh + o(h) ash Jl 0,i = 0; 
(4) P,(0) _ 6,3 


(5) by = 0, Ay > O, w,, A; > O,i = 1,2,.... 


The o(h) in each case may depend on 7. The matrix 


—Ay No 0 0 
by, —(A, + mM) A, 0 
A= 0 My —(A, + py) d, vee, GL) 


0 0 M3 (Ay + py) + °> 


is called the infinitesimal generator of the process. The parameters A; and 

4; are called, respectively, the infinitesimal birth and death rates. In Pos- 

tulates (1) and (2) we are assuming that if the process starts in state i, then 

in a small interval of time the probabilities of the population increasing or 

decreasing by | are essentially proportional to the length of the interval. 
Since the P,(t) are probabilities, we have P,(t) = O and 


S P(t) <1. (3.2) 
j=0 


Using the Markov property of the process we may also derive the so- 
called Chapman—Kolmogorov equation 


P(t + s) = >, P.(O)P,(s)- (3.3) 
k=0 


This equation states that in order to move from state i to state j in time 
t + s, X(t) moves to some state k in time ¢ and then from k to j in the 
remaining time s. This is the continuous-time analogue of formula (2.2) 
in III. 

So far we have mentioned only the transition probabilities P,(t). In 
order to obtain the probability that X(t) = n, we must specify where the 
process starts or more generally the probability distribution for the initial 
State. We then have 
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where 


3.2. Sojourn Times 


With the aid of the preceding assumptions we may calculate the distribu- 
tion of the random variable S,, which is the sojourn time of X(t) in state 7; 
that is, given that the process is in state i, what is the distribution of the 
time S, until it first leaves state 7? If we let 


Pr{S; = t} = Go), 
it follows easily by the Markov property that as h J 0, 
G{t + h) = G()G{h) = G(D[P(A) + o(h)] 
= G1 — CA; + wih] + o(h), 
or 
CEPA SO = (a, + WG + of), 
so that 
Gi(t) = —(A; + mG). (3.4) 
If we use the conditions G,(0) = 1, the solution of this equation is 
Git) = exp[—(A, + wr; 


i.e., 5; follows an exponential distribution with mean (A; + y,)7'. The 
proof presented here is not quite complete, since we have used the intu- 
itive relationship 


G,(h) = Ph) + o(h) 


without a formal proof. 

According to Postulates (1) and (2), during a time duration of length h 
a transition occurs from state i to i + 1 with probability A;h + o(h) and 
from state ito i — 1 with probability 4,4 + o(h). It follows intuitively that, 
given that a transition occurs at time f, the probability that this transition 
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is to state i + 1 is A/(A; + w,) and to state i — 1 is w/(A; + w,). The rig- 
orous demonstration of this result is beyond the scope of this book. 

It leads to an important characterization of a birth and death process, 
however, wherein the description of the motion of X(f) is as follows: The 
process sojourns in a given state i for a random length of time whose dis- 
tribution function is an exponential distribution with parameter (A; + p,). 
When leaving state i the process enters either state i + 1 or state i — 1 with 
probabilities A,/(A; + p,) and p/(A; + p,), respectively. The motion is 
analogous to that of a random walk except that transitions occur at ran- 
dom times rather than at fixed time periods. 

The traditional procedure for constructing birth and death processes is 
to prescribe the birth and death parameters {A,, j2;}/-) and build the path 
structure by utilizing the preceding description concerning the waiting 
times and the conditional transition probabilities of the various states. We 
determine realizations of the process as follows. Suppose X(0) = i; the 
particle spends a random length of time, exponentially distributed with 
parameter (A; + p,), in state i and subsequently moves with probability 
A,(A; + 2; to state i + 1 and with probability ,/(A; + y,) to state i — 1. 
Next the particle sojourns a random length of time in the new state and 
then moves to one of its neighboring states, and so on. More specifically, 
we observe a value ¢, from the exponential distribution with parameter 
(A; + y,) that fixes the initial sojourn time in state i. Then we toss a coin 
with probability of heads p. = A,/(A; + y,). If heads (tails) appears we 
move the particle to state i + 1 (i — 1). In state i + 1 we observe a value 
t, from the exponential distribution with parameter (A,,, + ;,,) that fixes 
the sojourn time in the second state visited. If the particle at the first tran- 
sition enters state i — 1, the subsequent sojourn time f, is an observation 
from the exponential distribution with parameter (A;_, + p,-,). After the 
second wait is completed, a Bernoulli trial is performed that chooses the 
next state to be visited, and the process continues in the same way. 

A typical outcome of these sampling procedures determines a realiza- 
tion of the process. Its form might be, for example, 


1, for0 <t<t, 
1 + l, for t, < t < f, + t,, 
l, fort, +i,<t<t+tht+h, 
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Thus by sampling from exponential and Bernoulli distributions appropri- 
ately, we construct typical sample paths of the process. Now it is possible 
to assign to this set of paths (realizations of the process) a probability 
measure in a consistent way so that P(t) is determined satisfying (3.2) and 
(3.3). This result is rather deep, and its rigorous discussion is beyond the 
level of this book. The process obtained in this manner is called the min- 
imal process associated with the infinitesimal matrix A defined in (3.1). 

The preceding construction of the minimal process is fundamental, 
since the infinitesimal parameters need not determine a unique stochastic 
process obeying (3.2), (3.3), and Postulates 1 through 5 of Section 3.1. In 
fact, there could be several Markov processes that possess the same infin- 
itesimal generator. Fortunately, such complications do not arise in the 
modeling of common phenomena. In the special case of birth and death 
processes for which A, > 0, a sufficient condition that there exists a 
unique Markov process with transition probability function P,(t) for 
which the infinitesimal relations © 2) and (3.3) hold is that 


— 5 > 0, = &, (3.5) 


n=0 
where 


NA, A. 
6, = 1, ¢=——",_ n= 1,2,.... 
Miho °°? MB, 


In most practical examples of birth and death processes the condition 
(3.5) is met, and the birth and death process associated with the prescribed 
parameters is uniquely determined. 


3.3. Differential Equations of Birth and Death Processes 


As in the case of the pure birth and pure death processes, the transition 
probabilities P(t) satisfy a system of differential equations known as the 
backward Kolmogorov differential equations. These are given by 


Po(t) = —ApPy(d) + AoP, ,(t), (3.6) 
P(t) — MP, (1) — (A; + i“ (t) + A; Pj, 1= 1, 
and the boundary condition P,(0) = 
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To derive these we have, from equation (3.3), 
P(t +h) = > Py(h)P,,(2) 
k=0 


= PP (0) + PADP,O) + PrP OP 


+ %,PAh)P,(0), 


where the last summation is over all k # i — 1, i, i + 1. Using Postulates 
~ 1, 2, and 3 of Section 3.1, we obtain 


Xi PAPA) SX P(A) 
= 1—[P,A) + Pi) + Pini] 
=1-[1 -—(A, + wh t+ o(h) + wht oth) + Ah + o(h)] 
= o(h), 
so that 
P(t + h) = whP.j) + [1 — A, + AIP + AMP. (1) + o(f). 


Transposing the term P,(t) to the left-hand side and dividing the equation 
by h, we obtain, after letting h J 0, 


P;(t) = Mb P-1 (0) — (A; + bP, (t) + A; P.,,,;(t). 


The backward equations are deduced by decomposing the time interval 
(0, t + h), where h is positive and small, into the two periods 


(0, h), (h, t + h) 


and examining the transition in each period separately. In this sense the 
backward equations result from a “first step analysis,” the first step being 
over the short time interval of duration h. 

A different result arises from a “last step analysis,” which proceeds by 
splitting the time interval (0, t + h) into the two periods 


(0, £), (t,t + h) 


and adapting the preceding reasoning. From this viewpoint, under more 
stringent conditions we can derive a further system of differential 
equations 
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Pit) = —AcPolt) + mPa, 
P;,(t) = Aj-1Pj-.() — (A; + MP; (t) + Mj+P. j41(0), J2 1, 


with the same initial condition P,(0) = 6,. These are known as the forward 
Kolmogorov differential equations. To derive these equations we inter- 
change t and h in equation (3.7), and under stronger assumptions in addi- 
tion to Postulates (1), (2), and (3) it can be shown that the last term is 
again o(h). The remainder of the argument is the same as before. The use- 
fulness of the differential equations will become apparent in the examples 
that we study in this and the next section. 

A sufficient condition that (3.8) hold is that [P,(h)]/h = o(1) for k # j, 
j — 1,j + 1, where the o(1) term apart from tending to zero is uniformly 
bounded with respect to k for fixed j as h — 0. In this case it can be proved 
that &; P,(t)P,(h) = o(h). 


(3.8) 


Example Linear Growth with Immigration A birth and death process 
is called a linear growth process if A, = An + a and yw, = pwn with A > 0, 
ju > 0, and a > O. Such processes occur naturally in the study of biolog- 
ical reproduction and population growth. If the state n describes the cur- 
rent population size, then the average instantaneous rate of growth is 
An + a. Similarly, the probability of the state of the process decreasing by 
one after the elapse of a small duration of time h is nh + oh). The fac- 
tor An represents the natural growth of the population owing to its current 
size, while the second factor a may be interpreted as the infinitesimal rate 
of increase of the population due to an external source such as immigra- 
tion. The component yn, which gives the mean infinitesimal death rate of 
the present population, possesses the obvious interpretation. 
If we substitute the above values of A,, and yz, in (3.8), we obtain 


Pio(t) = —aP,(t) + wP, (2), 
P(t) = [AG - 1) + aJP-.0 — (A + wy + alP,O 
+ eG + DP 510), jz. 


Now, if we multiply the jth equation by j and sum, it follows that the ex- 
pected value 


E[X(t)] = M(t) = >° jP,(2) 
j=l 


satisfies the differential equation 
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M'(t)=a+ (A— p)M(d), 


with initial condition M(O) = i, if X(O) = i. The solution of this equation 
1S 


M(t)=att+i if A = p, 


and 


M(t) = fee — tiem FA Fp. (3.9) 


Ap 
The second moment, or variance, may be calculated in a similar way. It is 
interesting to note that M(t) ~ ~ast— ~if A = yp, while if A < yp, the 
mean population size for large t is approximately 


a 
wor 
These results suggest that in the second case, wherein A < y, the pop- 
ulation stabilizes in the long run in some form of statistical equilibrium. 
Indeed, it can be shown that a limiting probability distribution {7r,} exists 
for which lim,,.. P(t) = 7,7 = 0, 1,.... Such limiting distributions for 
general birth and death processes are the subject of Section 4. 


Example The Two State Markov Chain Consider a Markov chain 
{X(t)} with state {0, 1} whose infinitesimal matrix 1s 


0 l 


“ile el 
A il gp -g (3.10) 
The process alternates between states 0 and 1. The sojourn times in state 
0 are independent and exponentially distributed with parameter a. Those 
in state 1 are independent and exponentially distributed with parameter B. 
This is a finite state birth and death process for which A, = a, A, = 0, 
by = 0 and pw, = B. The first Kolmogorov forward equation in (3.8) 
becomes 


P(t) = —@Py(t) + BPC). (3.11) 
Now, f(t) = 1 — Ao(t), which placed in (3.11) gives 
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Pot) = B- (at BPC). 
Let On(t) = e*?"P(t). Then 


a = er PL(t) + (a + Be*"Pe(t) 
= &PTP y(t) + (a + B)Fo(2)] 
—_ Be'er®", 


which can be integrated immediately to yield 
O(t) = B | e's» dt +C 
— ( B Jeeem + C. 
+ B 


The initial condition Q,,(0) = 1 determines the constant of integration to 
be C = a/(a + B). Thus 


ope + (oa) 
— (atp)t — (at+B)! + ; 
Qoo(t) = ef" Foot) (5 B e a+ Bp) (3.12) 
and 
B a 
= + (a+ Br . 
P(t) a+B at B° (3.13a) 
Since P(t) = 1 — P,(t), we have 
P(t) = — —F_g-tatpr (3.13b) 
at+B at+B 
and by symmetry, 
a B 
= ——— + “(at Bt 3. 
P(t) a+B at Be (3.13c) 
P(t) = Pp Bavaro, (3.13d) 


a+B atB 
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These transition probabilities assume a more succinct form if we repa- 
rametrize according to 7 = a/(a + B) and t= a+ B. Then 


Fo(t) = (1 — a) + me, (3.14) 

F(t) = 7 — me, (3.14b) 

P(t) = 1 — m -( — me", (3.14c) 
and 

Pad=at+d - me". (3.14d) 


Observe that 
lim F(t) = lim P(t) = 7, 


so that zis the long run probability of finding the process in state 1 inde- 
pendently of where the process began. The long run behavior of general 
birth and death processes is the subject of the next section. 


Exercises 


3.1. Particles are emitted by a radioactive substance according to a Pois- 
son process of rate A. Each particle exists for an exponentially distributed 
length of time, independent of the other particles, before disappearing. Let 
X(t) denote the number of particles alive at time t. Argue that X(r) is a birth 
and death process, and determine the parameters. 


3.2. Patients arrive at a hospital emergency room according to:a Pois- 
son process of rate A. The patients are treated by a single doctor on a first 
come, first served basis. The doctor treats patients more quickly when the 
number of patients waiting is higher. An industrial engineering time study 
suggests that the mean patient treatment time when there are k patients in 
the system is of the form m, = a — BkXk + 1), where a@ and P are con- 
stants with a > B > 0. Let N(t) be the number of patients in the system 
at time ¢ (waiting and being treated). Argue that M(t) might be modeled as 
a birth and death process with parameters A, = A for k = 0, 1, ...and 
jt, = k/m, fork = 0, 1,.... State explicitly any necessary assumptions. 
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3.3. Let {V(t)} be the two state Markov chain whose transition 
probabilities are given by (3.14a—d). Suppose that the initial distribution 
is (1 — a, 7). That is, assume that Pr{V(0) = 0} = 1 — qm and 
Pr{V(O) = 1} = a. In this case, show that Pr{V(t) = 1} = 7 for all times 
t > 0. 


Problems 


3.1. Let €&,n = 0,1,..., be a two state Markov chain with transition 
probability matrix 


0 l 
=the al 
R 1il—a al 


Let {N(1); t = 0} be a Poisson process with parameter A. Show that 
X(t) = Eves t= 0, 


is a two State birth and death process and determine the parameters A, and 
2, in terms of a@ and A. 


3.2. Collards were planted equally spaced in a single row in order to 
provide an experimental setup for observing the chaotic movements of the 
flea beetle (P. cruciferae). A beetle at position k in the row remains on that 
plant for a random length of time having mean m, (which varies with the 
“quality” of the plant) and then is equally likely to move right (k + 1) or 
left (k — 1). Model the position of the beetle at time ¢ as a birth and death 
process having parameters A, = py, = 1/(2m,) fork = 1,2,...,N— 1, 
where the plants are numbered 0, 1, ..., N. What assumptions might be 
plausible at the ends 0 and N? 


3.3. Let {V(t)} be the two state Markov chain whose transition proba- 
bilities are given by (3.14a-d). Suppose that the initial distribution is 
(1 — wa, 7m). That is, assume that Pr{V(0) = 0} = 1 — @ and 
Pr{V(O) = 1} = 7. For0 <5 < t, show that 


E[(VU(s)V(t)] = 7 — P(t — 5), 
whence 


Cov[Vis), V(t)] = 7 — mer. 
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3.4. A Stop-and-Go Traveler The velocity V(t) of a stop-and-go trav- 
eler is described by the two state Markov chain whose transition proba- 
bilities are given by (3.14a—-d). The distance traveled in time t is the inte- 
gral of the velocity: 


S(t) = | Viu) du. 
0 


Assuming that the velocity at time t = 0 is V(O) = O, determine the mean 
of S(t). Take for granted the interchange of integral and expectation in 


E{S()] = [ Etvan du. 
0 


4. The Limiting Behavior of Birth and 
Death Processes 


For a general birth and death process that has no absorbing states, it can 
be proved that the limits 


lim P(t) = 7,20 (4.1) 


exist and are independent of the initial state i. It may happen that 77, = 0 
for all states j. When the limits 77 are strictly positive, however, and satisfy 


> m= 1, (4.2) 
j=0 
they form a probability distribution that is called, naturally enough, the 
limiting distribution of the process. The limiting distribution is also a sta- 
tionary distribution in that 


7, = >, 7,P,(0), (4.3) 


which tells us that if the process starts in state i with probability 77,, then 
at any time f it will be in state 7 with the same probability 77,. The proof of 
(4.3) follows from (3.3) and (4.1) if we let t — © and use the fact that 
Ling 7 = 1. 
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The general importance of birth and death processes as models derives 
in large part from the availability of standard formulas for determining if 
a limiting distribution exists and what its values are when it does. These 
formulas follow from the Kolmogorov forward equations (3.8) that were 
derived in Section 3.3: 


Pit) = —ApPiolD + MPO, (4.4) 


P’. (0) = Aj-1F 5-1) —_ (A; + MP (t) + MiP ii), jJ21, 
\ 


with the initial condition \?,(0) = 6,. Now pass to the limit as t > © in 
(4.4) and observe first that\the limit of the right side of (4.4) exists ac- 
cording to (4.1). Therefore, the limit of the left side, the derivatives P.(t), 
exists as well. Since the probabilities are converging to a constant, the 
limit of these derivatives must be zero. In summary, passing to the limit in 
(4.4) produces 


O = —Ag™ + M7, 


(4.5) 
O= AH AT BIT BM Jl. 
The solution to (4.5) is obtained by induction. Letting 
ApAy °° Aj 
@%=1 and 6 =—— forj=1, (4.6) 
| MiM2 °°" By 
we have a, = Ay™%/m, = 6,7. Then, assuming that 7, = 6,7 for 


k=1,...,j, we obtain 
bya Ter = (Ay + 1) .m% — Ay-18}-17%0 
= X67 + (M8, — Ajy-18-1) Mo 
= 16,71, 
and finally 
Tj.) = 04). 


In order that the sequence {77} define a distribution, we must have 
2, 7 = 1. If & 6, < %, then we may sum the following, 
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Thy = ATM 

TT, = 6.7% 

TT, = 0,7 
1=(@ 6,) To 


to see that 7 = 1/2;., 9,, and then 


6. 


T= GF 


f TM forj =0,1,.... (4.7) 


_ i 
Xi=0 


If > 6, = %, then necessarily 7 = 0, and then 7, = 6,77 = 0 for all j, and 
there is no limiting distribution (lim,_,.. P(t) = 0 for all 7). 


Example Linear Growth with Immigration As described in the 
example at the end of Section 3.3, this process has birth parameters 
A, = a + An and death parameters w, = un forn = 0, 1,..., where 
A > 0 is the individual birth rate, a > 0 is the rate of immigration into the 
population, and jz > 0 is the individual death rate. 

Suppose A < yp. It was shown in Section 3.3 that the population mean 
M(t) converges to a/(jz — A) as t > ©. Here we will determine the limit- 
ing distribution of the process under the same condition A < yp. 

Then 4 = 1, 0, = a/p, 6, = a(a + A)/[p(2p)], 0; = ala + A)a + 2A)/ 
[4(24)(32)], and, in general, 


— alate [a+ (k- DA) 
a pi(k)! 


_ (alA)((@lA) + A=: (@/A) + kT WAY 
- ki " 


_ (“” i — (3). 


Now use the infinite binomial formula (I, equation (1.71)), 


* (N+k-1 
a= xy"= > ( A 


)xt for |x| < 1, 
k=0 
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to determine that 


Pap ee v7 el ers 


be be 
when A < p. Thus a7 = (1 — A/p)*”, and 


Ao + 1] --: [(@/A) +k - a7 
iv 


Tr =— ] — A/ pr)” 


k! 


fork > 1. 


Example Repairman Models A system is composed of N machines, 
of which at most M = N can be operating at any one time. The rest are 
“spares.” When a machine is operating, it operates a random length of 
time until failure. Suppose this failure time is exponentially distributed 
with parameter jp. 

When a machine fails, it undergoes repair. At most R machines can be 
“in repair” at any one time. The repair time is exponentially distributed 
with parameter A. Thus a machine can be in any of four states: (1) operat- 
ing; (ii) “up,” but not operating, 1.e., a spare; (iii) in repair; (iv) waiting for 
repair. There is a total of N machines in the system. At most M can be op- 
erating. At most R can be in repair. 

The action is diagrammed in Figure 4.1 on the next page. 

Let X(t) be the number of machines “up” at time ¢, either operating or 
spare. Then (we assume) the number operating is min{X(t), M}, and the 
number of spares is max{0, X(t) — M}. Let Y(t) = N — X(t) be the num- 
ber of machines “down.” Then the number in repair is min{Y(t), R}, and 
the number waiting for repair is max{0, Y(t) — R}. The foregoing formu- 
las permit us to determine the number of machines in any category, once 
X(t) is known. 
Then X(?) is a finite state birth and death process’ with parameters 


A, = A X min{N — n, R} 
={va forn=0,1,...,N—R, 
ACN — n) forn=N—-R++1,...,N, 


' The definition of birth and death processes was given for an infinite number of states. 
The adjustments in the definitions and analyses for the case of a finite number of states are 
straightforward and even simpler than the original definitions, and are left to the reader. 
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FACTORY 


Capacity = M 
Failure Rate = wp 


000000 


X(t) = Number of machines “up” 


Waiting 
for 


repair 
000 


REPAIR SHOP 


Capacity = R 
Repair Rate = A 


0000 


Wo 


Y (t) = Number of machines “down” 


Figure 4.1 Repairman model. 


and 


pon forn =0,1,...,M, 


by = we X min(n, M) = fn forn=M+1,...,N. 


It is now a routine task to determine the limiting probability distribu- 
tion for any values of A, yu, N, M, and R. (See Problems 4.1 and 4.7.) In 
terms of the limiting probabilities 7, 7,,..., 7), Some quantities of in- 
terest are the following: 


Average Machines Operating = 7, + 277, + ++: + Mary 


+ M(ty.,; + °° + Ty); 


es Average Machines Operating 
Long Run Utilization = —. —————— 
Capacity 


TT, + 27, + ee + Mr, 
M 
+ (ys, + °° + Ty); 
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Average Idle Repair Capacity = 1ty_pi; + 27 y_pi2 + °° + Roy. 


These and other similar quantities can be used to evaluate the desir- 
ability of adding additional repair capability, additional spare machines, 
and other possible improvements. 

The stationary distribution assumes quite simple forms in certain spe- 
cial cases. For example, consider the special case in which M = N = R. 
The situation arises, for instance, when each machine’s operator becomes 
its repairman upon its failure. Then A, = A(N — n) and mw, = pn for 
n=0,1,...,N, and following (4.6), we determine 6 = 1, 0, = AN/y, 
9, = (AN)A(N — 1)/u(2), and, in general, 


1 (N- 
0 = Dae oD ; ~(2) - la) : 
The binomial formula (1 + x)” = Xf_, (/)x* applies to yield 
eee 


Thus 7 = [1 + (A/w)]-* = [w/(A + p)]’, and 


7, = Ge) wera + yp)" 


~ we ; (5 : ) 


We recognize (4.8) as the familiar binomial distribution. 


(4.8) 


Example Logistic Process Suppose we consider a population whose 
size X(t) ranges between two fixed integers N and M (N < M) for all 
t = 0. We assume that the birth and death rates per individual at time ¢ are 
given by 


A= a(M — X(t)) and p= B(X(t) — N), 


and that the individual members of the population act independently of 
each other. The resulting birth and death rates for the population then be- 
come 


A, =an(M—n) and yp, = Bn(n — N). 
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To see this we observe that if the population size X(f) is n, then each of the 
n individuals has an infinitesimal birth rate A, so that A, = an(M — n). The 
same rationale applies in the interpretation of the p.,. 

Under such conditions one would expect the process to fluctuate be- 
tween the two constants N and M, since, for example, if X(t) is near M, the 
death rate is high and the birth rate low, and then X(f) will tend toward N. 
Ultimately the process should display stationary fluctuations between the 
two limits N and M. 

The stationary distribution in this case is 


‘os M-—WN a \" 
Ten = ———( \=) , m=0,1,2,...,M-—N, 
N+m m B 


where c is an appropriate constant determined so that &,, ay, = 1. To see 
this, we observe that 

Ayana) “v* Ayem=1 

My+ibw+2 °°" Mnsm 


Oy, +m — 


a"N(N + 1)-+-(N+m—- 1)\(M-N)°::-(M-N-m+ 1) 
B'(N + 1) +++ (+ mm! 


Example Some Genetic Models Consider a population consisting of 
N individuals that are either of gene type a or gene type A. The state of 
the process X(t) represents the number of a-individuals at time ¢. We as- 
sume that the probability that any individual dies and is replaced by an- 
other during the time interval (t, t + h) is Ah + o(h) independent of the 
values of X(r) and that the probability of two or more changes occurring 
in a time interval h is o(h). 

The changes in the population structure are effected as follows. An 
individual is to be replaced by another chosen randomly from the popula- 
tion; i.e., if X(t) = j, then an a-type is selected to be replaced with proba- 
bility j/N and an A-type with probability 1 — j/N. We refer to this stage as 
death. Next, birth takes place by the following rule. Another selection is 
made randomly from the population to determine the type of the new in- 
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dividual replacing the one that died. The model introduces mutation pres- 
sures that admit the possibility that the type of the new individual may be 
altered upon birth. Specifically, let -y, denote the probability that an a-type 
mutates to an A-type, and let y, denote the probability of an A-type mu- 
tating to an a-type. 

The probability that the new individual added to the population is of 


type a is 


| “ —) + ( - J) (4.9) 
We deduce this formula as\ follows: The probability that we select an a- 
type and that no mutation occurs is (j/N)(1 — y,). Moreover, the final type 
may be an a-type if we select an A-type that subsequently mutates into an 
a-type. The probability of this contingency is (1 — j/N)y,. The combina- 
tion of these two possibilities gives (4.9). 
We assert that the conditional probability that X(t+) — X(4) = 1 when 
a change of state occurs is 
(1 - Z\ ra ~y)+ ( - arat where X(t) =j. (4.10) 
N/LN N/*" 


In fact, the a-type population size can increase only if an A-type dies (is 
replaced). This probability is 1 — (j/N). The second factor is the proba- 
bility that the new individual is of type a as in (4.9). 

In a similar way we find that the conditional probability that 
X(t+) — X(t) = —1 when a change of state occurs is 


A\(1 — ae — y,) + Jy) where X(t) = j. 

The number of type a individuals in the population is thus a birth and 
death process with a finite number of states and infinitesimal birth and 
death rates 


nA fo—+(-B 


and 
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J 
Hy = A 


Loy. + (1-Z)a- yon, O<j<N. 
Although these parameters seem rather complicated, it is interesting to 
see what happens to the stationary measure {77,};<0 if we let the popula- 
tion size N — © and the probabilities of mutation per individual y, and y, 
tend to zero in such a way that y,V — xk, and y,.NV > k,, where 0 < x,, 
K, < ©, At the same time, we shall transform the state of the process to 
the interval [0, 1] by defining new states j/N, i.e., the fraction of a-types 
in the population. To examine the stationary density at a fixed fraction x, 
where 0 < x < 1, we shall evaluate 77, as k — © in such a way that 
k = [xN], where [xN] is the greatest integer less than or equal to xN. 
Keeping these relations in mind we write 


MN - Jj Ny, 
A; = AWD —-y7- vill + “) where a = hr 
N j 1-7 - % 
and 
AWN — J) . _ NY, 
= a= = (+ ) where b = 5», 
Then 


k-1 k 
log 6, = >» log A; — >» log p,; 
j=0 j=l 


k-1 


a\ *) b 
= log( + ) — lo { + ; + log Na 
2 j 2, N~-J ° 


J l 


—] N — mK 1 + 
og( ) N 


Now, using the expansion 


x? x? 
log(ltx)=x-T+ 77°", <1, 
og( x)=x 5 3 ix 


it is possible to write 


k-1 a k-\ l 
Y log(1 +4) =a = + 
j=l J j=l J 
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where c, approaches a finite limit as k > ~. Therefore, using the relation 
k-1 | 
» = ~ logk ask 4 &, 


j=l 


we have 


» “Tog(1 + “|~ log k“ + ¢, ask > %, 


In a similar way we obtain 


k-1 b \ N? 

2, log(1 + na) ~ lB — + d, ask &, 
where d, approaches a finite limit as k — ©. Using the above relations we 
have 
k“(N — k)’Na 


4.1 
WN bE) ask > &, (4.11) 


log 0, ~ log| p 


where log C, = c, + d,, which approaches a limit, say C, as k — *. Notice 
that a > x, and b > k, as N > &, Since k = [Nx], we have, for N > =, 


6. ~ CKNe "x2" — xy. 


Now, from (4.11) we have 


Therefore, 


Wb wD lw) OR) 


Since C, — Cas k tends to ©, we recognize the right side as a Riemann 
sum approximation of 


l 
KC | xe") — xy"! dx. 
0 
Thus 


S a= None | "~'(] — x)! dx, 


‘=0 
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so that the resulting density on [0, 1] is 
6, l x2" _ x)! xe2-'(] — x7! dx 


~~ ce Os SD 


since dx ~ 1/N. This is a beta distribution with parameters x, and x,. 


Exercises 


4.1. In a birth and death process with birth parameters A, = A for 
n=0,1,...and death parameters uw, = un forn = 0,1,..., we have 


(Ap)’e"”” 


where 
l 
p=-—t[l -—e*'. 
vi 


Verify that these transition probabilities satisfy the forward equations 
(4.4), with 7 = 0. 


4.2. Let X(t) be a birth and death process where the possible states 
are 0, 1,..., N, and the birth and death parameters are, respectively, 
A, = a(N — n) and yp, = Bn. Determine the stationary distribution. 


4.3. Determine the stationary distribution for a birth and death process 
having infinitesimal parameters A, = a(n + 1) and yw, = Bn’ for 
n=0,1,...,whereO<a< 8B. 


4.4. Consider two machines, operating simultaneously and indepen- 
dently, where both machines have an exponentially distributed time to 
failure with mean 1/ (jz is the failure rate). There is a single repair facil- 
ity, and the repair times are exponentially distributed with rate A. 


(a) In the long run, what is the probability that no machines are oper- 
ating? 

(b) How does your answer in (a) change if at most one machine can op- 
erate, and thus be subject to failure, at any time? 
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4.5. Consider the birth and death parameters A, = 6 < 1, and 
pu, = n(n + 1) forn = 0, 1,.... Determine the stationary distribution. 


4.6. A birth and death process has parameters A, = A and yw, = np, for 
n= 0, 1,.... Determine the stationary distribution. 


Problems 


4.1. For the repairman. model of the second example of this section, 
suppose that M = N = 5,R = 1,A = 2, and w = 1. Using the limiting 
distribution for the system, determine 


(a) The average number of machines operating. 
(b) The equipment utilization. 
(c) The average idle repair capacity. 


How do these system performance measures change if a second repairman 
is added? 


4.2. Determine the stationary distribution, when it exists, for a birth and 
death process having constant parameters A, = A for n = 0, 1,... and 
bb, = wforn = 1,2,.... 


4.3. A factory has five machines and a single repairman. The operating 
time until failure of a machine is an exponentially distributed random 
variable with parameter (rate) 0.20 per hour. The repair time of a failed 
machine is an exponentially distributed random variable with parameter 
(rate) 0.50 per hour. Up to five machines may be operating at any given 
time, their failures being independent of one another, but at most one ma- 
chine may be in repair at any time. In the long run, what fraction of time 
is the repairman idle? 


4.4. This problem considers a continuous time Markov chain model for 
the changing pattern of relationships among members in a group. The 
group has four members: a, b, c, and d. Each pair of the group may or may 
not have a certain relationship with each other. If they have the relation- 
ship, we say that they are linked. For example, being linked may mean that 
the two members are communicating with each other. The following 
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graph illustrates links between a and b, between a and c, and between 


b and d: 
© 
() @) 
(S) 


Figure 4.2 


Suppose that any pair of unlinked individuals will become linked in a 
small time interval of length h with probability ah + o(h). Any pair of 
linked individuals will lose their link in a small time interval of length h 
with probability Bh + o(h). Let X(t) denote the number of linked pairs of 
individuals in the group at time ¢t. Then X(f) is a birth and death process. 


(a) Specify the birth and death parameters A, and yu, fork = 0,1,.... 
(b) Determine the stationary distribution for the process. 


4.5. A chemical solution contains N molecules of type A and an equal 
number of molecules of type B. A reversible reaction occurs between type 
A and B molecules in which they bond to form a new compound AB. Sup- 
pose that in any small time interval of length h, any particular unbonded 
A molecule will react with any particular unbonded B molecule with prob- 
ability ah + o(h), where a is a reaction rate of formation. Suppose also 
that in any small time interval of length h, any particular AB molecule dis- 
associates into its A and B constituents with probability Bh + o(h), where 
B is a reaction rate of dissolution. Let X(t) denote the number of AB mol- 
ecules at time ¢. Model X(¢) as a birth and death process by specifying the 
parameters. 


4.6. A time-shared computer system has three terminals that are at- 
tached to a central processing unit (CPU) that can simultaneously handle 
at most two active users. If a person logs on and requests service when 
two other users are active, then the request is held in a buffer until it can 
receive service. Let X(t) be the total number of requests that are either ac- 
tive or in the buffer at time t. Suppose that X(‘) is a birth and death process 
with parameters 
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\ =|, fork = 0, 1, 2, 
‘0 for k = 3; 


and 


_ pe fork = 0, 1, 2, 
Bae 2u ~=—s fork = 3. 


Determine the long run probability that the computer is fully loaded. 


4.7. A-system consists of three machines and two repairmen. At most 
two machines can operate at any time. The amount of time that an oper- 
ating machine works before breaking down is exponentially distributed 
with mean 5. The amount of time that it takes a single repairman to fix a 
machine is exponentially distributed with mean 4. Only one repairman 
can work on a failed machine at any given time. Let X(t) be the number 
of machines in operating condition at time tf. 


(a) Calculate the long run probability distribution for X(v). 
(b) If an operating machine produces 100 units of output per hour, 
what is the long run output per hour of the system? 


4.8. A birth and death process has parameters 

A, = a(k + 1) fork =0,1,2,..., 
and 

by, = Bk + 1) fork =1,2,.... 


Assuming that a < B, determine the limiting distribution of the process. 
Simplify your answer as much as possible. 


9. Birth and Death Processes with Absorbing States 


Birth and death processes in which A, = 0 arise frequently and are corre- 
spondingly important. For these processes, the zero state is an absorbing 
state. A central example is the linear-growth birth and death process with- 
out immigration (cf. Section 3.3). In this case A, = nA and ww, = Ny. Since 
growth of the population results exclusively from the existing population, 
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it is clear that when the population size becomes zero it remains zero 
thereafter; i.e., 0 is an absorbing state. 


5.1. Probability of Absorption into State 0 


It is of interest to compute the probability of absorption into state 0 start- 
ing from state i (i = 1). This is not, a priori, a certain event, since con- 
ceivably the particle (i.e., state variable) may wander forever among the 
states (1, 2, ...) or possibly drift to infinity. 

Let u,; (i = 1, 2, ...) denote the probability of absorption into state 0 
from the initial state i. We can write a recursion formula for u; by consid- 
ering the possible states after the first transition. We know that the first 
transition entails the movements 


imirtl with probability ———— a+ h. 


i+i-1 with probability a, ; y: 


Invoking the familiar first step analysis, we directly obtain 
u; = ———u,, + Fy, i=l, (5.1) 


where u, = 1. 

Another method for deriving (5.1) is to consider the “embedded ran- 
dom walk” associated with a given birth and death process. Specifically, 
we examine the birth and death process only at the transition times. The 
discrete time Markov chain generated in this manner is denoted by {Y,} 5-0, 
where Y, = Xj is the initial state and Y, (n = 1) is the state at the nth tran- 
sition. Obviously, the transition probability matrix has the form 


1 0 O O::: 
q O p, O:-- 
Og O pr.°:: 
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where 


Aj. 
= —=1-gq; fori = 1. 
Ai + B; 


B 


The probability of absorption into state 0 for the embedded random walk 
is the same as for the birth and death processes, since both processes ex- 
ecute the same transitions. A closely related problem (gambler’s ruin) for 
a random walk was examined in III, Section 6.1. 
We turn to the task of solving (5.1) subject to the conditions u) = 1 and 
0 <u, = 1 (i= 1). Rewriting (5.1), we have 
Ki 


\ 
(Uj, — Uj) = yi — Uj-)), 1= 1. 


t 


Defining v, = u;,, — u, we obtain 


_ Bi . 
y= Vins 1= 1. 


i 


Iteration of the last relation yields the formula v, = p,y%, where 
fori = 1; 


and with u,,, — u; = v, 
Ujx) — U; = V; = PY = pu, — Uo) = p(u, — 1) fori = 1. 
Summing these last equations from i = 1 toi = m — 1, we have 


m—] 


Uy, — Uy =(u-1)> p,m >I. (5.2) 
i=] 


Since u,, by its very meaning is bounded by 1, we see that if 


D, P= %, (5.3) 


then necessarily u, = 1 and u,, = 1 for all m = 2. In other words, if (5.3) 


m 


holds, then ultimate absorption into state 0 is certain from any initial state. 
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Suppose 0 < u, < 1; then, of course, 


x 


S* p< %. 

i=1 | 
Obviously, u,, 1s decreasing in m, since passing from state m to state 0 re- 
quires entering the intermediate states in the intervening time. Further- 
more, it can be shown that u,, > 0 as m + &. Now letting m > © in (5.2) 
permits us to solve for u,; thus 


y= i 
1+ di, p; 
and then from (5.2), we obtain 
U,, - Pm Pi m= I. 
1+ di, 2p, 


5.2. Mean Time Until Absorption 


Consider the problem of determining the mean time until absorption, 
Starting from state m. 

We assume that condition (5.3) holds, so that absorption is certain. No- 
tice that we cannot reduce our problem to a consideration of the embed- 
ded random walk, since the actual time spent in each state is relevant for 
the calculation of the mean absorption time. 

Let w, be the mean absorption time starting from state i (this could be 
infinite). Considering the possible states following the first transition, in- 
Stituting a first step analysis, and recalling the fact that the mean waiting 
time in state i is (A; + y,)7' (it is actually exponentially distributed with 
parameter A; + p;), we deduce the recursion relation | 


A; b,; . 
A; + Bb; A, + p; Ai + p,; 


where w, = 0. Letting z; = w, — w,,, and rearranging (5.4) leads to 


] 
z= 1 + Z-4, i= 1. (5.5) 


Iterating this relation gives 


5. Birth and Death Proggsses with Absorbing States 383 


l My 
= — + Zo, 
£1 A, r, 0 
,2ty Hh, = 14 He Boba 
say, Vn, COO, VOC, OY, 
1 , ; 
z=—+ M3 4 M3b2 4 MaboMy 


Ay AgAy AAA, Ag, 
and finally | 


Zn = oe [| #6 (Fhe 
i=] A; j= i+] A; j=l j; 


(The product IT"),, ;/A, is interpreted as 1.) Using the notation 


Mie * °° 


= 1 and p= , iz l, 
Po p MAg A, 
the expression for z,, becomes 
mn l p 
gm = yO ! + PinZos 
i=] h; P; 
Or, since Z,, = W,, — W,,4, and Z) = Wy — Ww, = —w,, then 
l mn l 
—(w Wn _ Wasi) = A, _ W). (5.6) 
Pn iPi 


If %;_, (1/A,p,) = &, then inspection of (5.6) reveals that necessarily 
w, = ©. Indeed, it is probabilistically evident that w,, < w,,,, for all m, and 
this property would be violated for m large if we assume to the contrary 
that w, is finite. 

Now suppose %;_, (1/A,p,) < %; then letting m — ° in (5.6) gives 


J 

=d5 yO lim — —(w W,, _ Wrst) 
i= A;p; me Pan 

It is more involved but still possible to prove that 


. ol 
lim —(w,, 7 Wn+1) = 0, 
Me Pn 
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and then 


We summarize the discussion of this section in the following theorem: 


Theorem 5.1. Consider a birth and death process with birth and death 
parameters X, and w,, n = 1, where A, = 0 so that 0 is an absorbing state. 
The probability of absorption into state 0 from the initial state m is 


Dizm Pi if . 
r= p; < ®, 
1+ Bis p; 2 
uy, = = (5.7) 
l if >. Pp, = &. 
i=] 
The mean time to absorption is 
| 
oe if — = ©, 
2, Xp; 5.8) 
Wn = % ] m—| x ] 20 l ( . 
— + . —e 
2, Kip; > Pe De X;p; fd, Ap; ” 


where py = | and p, = (Myb2 *** M)MA\A2 *** A). 


Example Population Processes Consider the linear growth birth and 
death process without immigration (cf. Section 3.3) for which pw, = nu 
and A, = nA,n = 0, 1,.... During a short time interval of length h, a sin- 
gle individual in the population dies with probability zh + o(h) and gives 
birth to a new individual with probability Ah + o(h), and thus yz > O and 
A > 0 represent the individual death and birth rates, respectively. 

Substitution of a = 0 and i = m in equation (3.9) determines the mean 
population size at time ¢ for a population starting with X(0) = m individ- 
uals. This mean population size is M(t) = me°-»”, exhibiting exponential 
growth or decay according as A > wordA < p. 

Let us now examine the extinction phenomenon and determine the 
probability that the population eventually dies out. This phenomenon cor- 
responds to absorption in state 0 for the birth and death process. 

When A, = nA and py, = np, a direct calculation yields p, = (w/A)’, and 
then 
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2 Hla)” when A > yp, 


> P= y (w/a = 4 1 — (WA) 


i=m =m 
00 when A = wp. 


From Theorem 5.1, the probability of eventual extinction starting with m 
individuals is 


/r m 
Pr{Extinction|X(0) = m} = te M When k> te 5.9) 


| ] when A = yp. 


When A = p, the process is sure to vanish eventually. Yet in this case 
the mean population size remains constant at the initial population level. 
Similar situations where mean values do not adequately describe popula- 
tion behavior frequently arise when stochastic elements are present. 

We turn attention to the mean time to extinction assuming that extinc- 
tion is certain, that is, when A = w. For a population starting with a sin- 
gle individual, then, from (5.8) with m = 1 we determine this mean time 
to be 


(5.10) 


H 

| 
=} 
o—™~ 
— 

| 

d 
— 


l be 
_ aaa when p> A, 


00 when pw = A. 
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When the birth rate A exceeds the death rate yw, a linear growth birth and 
death process can, with strictly positive probability, grow without limit. In 
contrast, many natural populations exhibit density-dependent behavior 
wherein the individual birth rates decrease or the individual death rates in- 
crease or both changes occur as the population grows. These changes are 
ascribed to factors including limited food supplies, increased predation, 
crowding, and limited nesting sites. Accordingly, we introduce a notion of 
environmental carrying capacity K, an upper bound that the population 
size cannot exceed. 

Since all individuals have a chance of dying, with a finite carrying ca- 
pacity, all populations will eventually become extinct. Our measure of 
population fitness will be the mean time to extinction, and it is of interest 
to population ecologists studying colonization phenomena to examine 
how the capacity K, the birth rate A, and the death rate yu affect this mean 
population lifetime. 

The model should have the properties of exponential growth (on the av- 
erage) for small populations, as well as the ceiling K beyond which the 
population cannot grow. There are several ways of approaching the popu- 
lation size K and staying there at equilibrium. Since all such models give 
more or less the same qualitative results, we stipulate the simplest model, 
in which the birth parameters are 


=| forn=0,1,...,K—1, 
"0 forn > K. 


Theorem 5.1 yields w,, the mean time to population extinction starting 
with a single individual, as given by 


z SMA AR, LS Ayr 
m= > a5 = ye .© (5.11) 


Mi fr °* Bij M j= ! 

Equation (5.11) isolates the distinct factors influencing the mean time 
to population extinction. The first factor is 1/y, the mean lifetime of an in- 
dividual, since p 1s the individual death rate. Thus, the sum in (5.11) rep- 
resents the mean generations, or mean lifespans, to population extinction, 
a dimensionless quantity that we denote by 


A 
= pw, = > 5 “9, where 6 = - (5.12) 
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Next we examine the influence of the birth—death, or reproduction, ratio 
6 = A/w and the carrying capacity K on the mean time to extinction. Since 
X represents the individual birth rate and 1/y is the mean lifetime of a sin- 
gle member in the population, we may interpret the reproduction ratio 
6 = A(1/p) as the mean number of offspring of an arbitrary individual in 
the population. Accordingly, we might expect significantly different be- 
havior when 6 < 1 as opposed to when 6 > 1, and this is indeed the case. 
A carrying capacity of K = 100 is small. When K is on the order of 100 
or more, we have the following accurate approximations, their derivations 
being sketched in Exercises 5.1 and 5.2: 


1 Vf 1 
ra for 8< 1, 
M, ~ 0.5772157 + Ink for 0 = 1, (5.13) 
Aa) 
Kio 1 for O> 1. 


The contrast between 80 < 1 and 68> 1 is vivid. When @ < 1, the mean 
generations to extinction M, is almost independent of carrying capacity K 
and approaches the asymptotic value 07' In(1 — 6)~' quite rapidly. When 
6 > 1, the mean generations to extinction M, grows exponentially in K. 
Some calculations based on (5.13) are given in Table 5.1. 


Table 5.1 Mean generations to extinction for 
a population starting with a single 
parent and where @ is the reproduc- 
tion rate and K is the environmen- 
tal capacity. 


1.96 


2.01 ; 4.14 x 10° 
2.01 , 7.59 X 10” 
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Example Sterile Male Insect Control The screwworm fly, a cattle 
pest in warm climates, was eliminated from the southeastern United States 
by the release into the environment of sterilized adult male screwworm 
flies. When these males, artificially sterilized by radiation, mate with na- 
tive females, there are no offspring, and in this manner part of the repro- 
ductive capacity of the natural population is nullified by their presence. If 
the sterile males are sufficiently plentiful so as to cause even a small de- 
cline in the population level, then this decline accelerates in succeeding 
generations even if the number of sterile males is maintained at approxi- 
mately the same level, because the ratio of sterile to fertile males will in- 
crease as the natural population drops. Because of this compounding ef- 
fect, if the sterile male control method works at all, it works to such an 
extent as to drive the native population to extinction in the area in which 
it is applied. 

Recently, a multibillion-dollar effort involving the sterile male tech- 
nique has been proposed for the control of the cotton boll weevil. In this 
instance, it was felt that pretreatment with a pesticide could reduce the 
natural population size to a level such that the sterile male technique 
would become effective. Let us examine this assumption, first with a de- 
terministic model and then in a stochastic setting. 

For both models we suppose that sexes are present in equal numbers, 
that sterile and fertile males are equally competitive, and that a constant 
number S of sterile males is present in each generation. In the determinis- 
tic case, if N, fertile males are in the parent generation and the N, fertile 
females choose mates equally likely from the entire male population, then 
the fraction N,/(N, + S) of these matings will be with fertile males and 
will produce offspring. Letting 6 denote the number of male offspring that 
results from a fertile mating, we calculate the size N of the next genera- 
tion according to , 


No 
N, = ON rn <} (5.14) 

For a numerical example, suppose that there are N, = 100 fertile males 
(and an equal number of fertile females) in the parent generation of the 
native population, and that S = 100 sterile male insects are released. If 
6 = 4, meaning that a fertile mating produces four males (and four fe- 
males) for the succeeding generation, then the number of both sexes in the 
first generation is 
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100 
100 + 100 


the population has increased, and the sterile male control method has 
failed. 


N, = 4100) = 200: 


Table 5.2 The trend of an insect population subject to sterile male re- 
leases. 


Number of Ihsects Number of Ratio Sterile Number of 
Generation Natural Population Sterile Insects _ to Fertile Progeny 
\ 


\ 


Parent 20 100 5:1 13.33 
F, 13.33 100 7.5:1 6.27 
F, 6.27 100 16:1 1.48 
F, 1.48 100 67.5: 1 0.09 
F, 0.09 100 1156: 1 — 


On the other hand, if a pesticide can be used to reduce the initial popu- 
lation size to Ny = 20, or 20 percent of its former level, and S = 100 ster- 
ile males are released, then 


N, = 4(20)( = 13.33, 


20 + 100 
and the population is declining. The succeeding population sizes are given 
in Table 5.2, above. With the pretreatment, the population becomes extinct 
by the fourth generation. 

Often deterministic or average value models will adequately describe 
the evolution of large populations. But extinction is a small population 
phenomenon, and even in the presence of significant long term trends, 
small populations are strongly influenced by the chance fluctuations that 
determine which of extinction or recolonization will occur. This fact mo- 
tivates us to examine a stochastic model of the evolution of a population 
in the presence of sterile males. The factors in our model are 


A, _ the inidividual birth rate; 
ft, _ the inidividual death rate; 
@ = A/u, the mean offspring per individual; 
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K, _ the carrying capacity of the environment; 
5S, the constant number of sterile males in the population; 
m, _ the initial population size. 


We assume that both sexes are present in equal numbers in the natural 
population and that X(z), the number of either sex present at time f, 
evolves as a birth and death process with parameters 


An r 
A= n+s 


0 forn= K, 


ifO=n<K, 


and (5.15) 
Mh, = pn forn=0,1,.... 


This is the colonization model of the Population Processes example, 
modified in analogy with (5.14) by including in the birth rate the factor 
ni(n + S) to represent the probability that a given mating will be fertile. 

To calculate the mean time to extinction w,, as given in (5.8), we first 
use (5.15) to determine 


_ Mia Ma (2) < + S)! 


Pe Aa A, NAD ORES! 


Po = 1, and p, =~, or l/p, = 0, 


and then substitute these expressions for p, into (5.8) to obtain 


_ 1 ("S K-\ 1 gir D'S + +} 
izo fax J +1 k\(S + j)! 
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Because of the factorials, equation (5.16) presents numerical difficulties 
when direct computations are attempted. A simple iterative scheme works 
to provide accurate and effective computation, however. We let 


K-1 ; 
a, = 5 —— 9 LRT 
Ag+ MS tp 


so that w,, = @ + °°: + @,,-,)/u. But it is easily verified that 


a =~ + a —| 
"k— I k 5 4 k OQ. 


Beginning with a, = Q, one successively computes a'y_,, Ag, .-- , Ao 
and then w,, = (@ + °°: + @,,_,)/p. 

Using this method, we have computed the mean generations to extinc- 
tion in the stochastic model for comparison with the deterministic model 
as given in Table 5.2. Table 5.3 lists the mean generations to extinction for 
various initial population sizes m when K = S = 100, A = 4, and p = 1, 
so that 6 = 4. Instead of the four generations to extinction as predicted by 
the deterministic model when m = 20, we now estimate that the popula- 
tion will persist for over 8 billion generations! 


Table 5.3 The mean lifespans to 
extinction in a birth and 
death model of a popula- 
tion containing a constant 
number S = 100 of sterile 


males. 
Initial Mean Lifespans 
Population Size to Extinction 
20 8,101,227,748 
10 4,306,531 
5 3,822 
4 566 
3 65 
2 6.3 
l 1.2 
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What is the explanation for the dramatic difference between the predic- 
tions of the deterministic model and the predictions of the stochastic 
model? The stochastic model allows the small but positive probability that 
the population will not die out but will recolonize and return to a higher 
level near the environmental capacity K, and then persist for an enormous 
length of time. 

While both models are qualitative, the practical implications cannot be 
dismissed. In any large-scale control effort, a wide range of habitats and 
microenvironments is bound to be encountered. The stochastic model sug- 
gests the likely possibility that some subpopulation in some pocket might 
persist and later recolonize the entire area. A sterile male program that de- 
pends on a pretreatment with an insecticide for its success is chancy at 
best. 


Exercises 


5.1. Assuming 6 < 1, verify the following steps in the approximation to 
M,, the mean generation to extinction as given in (5.12): 


M, = Shoe oS farts 


i=1 O 


Py — xk 4 x* 
— 9! — 9-! 
9 [a= o {= fe 
O O 
6 
l l ! K 2 
— ~ ¢ tytxrtee 
9 ™ = 5 0 J ac x+x ) dx 
] ] “(— Q@xkr? 
= —|p——  - — + fee 
6 j1-@ O\K+ 1 K+2 
l l gx K+ 1 K+ 1 
= — | — ( + to] 
6° 1-6 K+I1 Kio | K+3° 


a Oy 
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5.2. Assume that 6 > 1 and verify the following steps in the approxi- 
mation to M,, the mean generation to extinction as given in (5.12): 


a i] K ae K-i+1 
M,= 2,779 = 6 2,79 


~—|_—_|- 
K l1-(/@0} K(@-1) 


Problems 


5.1. Consider the sterile male control model as described in the exam- 
ple entitled “Sterile Male Insect Control” and let u,, be the probability that 
the population becomes extinct before growing to size K starting with 
X(0) = m individuals. Show that 


K-| 
= Pion P form=1,...,K, 
i=0 Fi 
where 
(S + i)! 
— g SFO 
i! 
5.2. Consider a birth and death process on the states 0, 1,..., 5 with 
parameters 


No = Mo = As = Ms = 9, 
A, = 1, A, = 2, A; = 3, A, = 4, 


b, = 4, pb, = 3, bh, = 2, w=. 
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Note that 0 and 5 are absorbing states. Suppose the process begins in state 
X(0) = 2. 


(a) What is the probability of eventual absorption in state 0? 
(b) What is the mean time to absorption? 
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A continuous time Markov chain X(t) (t > 0) is a Markov process on the 
states 0, 1,2,.... We assume as usual that the transition probabilities are 
stationary; i.e., 


P,(t) = Pr{X(t + s) = j|X(s) = i}. (6.1) 


In this section we consider only the case where the state space S is finite, 
labeled as {0, 1,2,..., N}. 
The Markov property asserts that P,(t) satisfies 


(a) P,(t) = 0, 


N 
(bt) > P(t)=1, i f=0,1,...,N,and 
J=0 


N 
(c) Pst) = > P(s)P.Q) fort, s = 0 
'=0 
, (Chapman—Kolmogorov relation), 


and we postulate in addition that 


aj, 


I, 
(d) lim £;() = lo i # ij. 


= then property (c) can be written 


If P(t) denotes the matrix P(t) 
compactly in matrix notation as 


P(t + s) = P()P(s), t_s= 0. (6.2) 


Property (d) asserts that P(t) is continuous at t = 0, since the fact P(O) = I 
(= identity matrix) is implied by (6.2). It follows simply from (6.2) that 
P(t) is continuous for all t > 0. In fact, if s = h > 0 in (6.2), then because 
of (d), we have 
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lim P(t + h) = P® lim P(h) = POI = PC. (6.3) 


On the other hand, for t > 0 and 0 < h < t we write (6.2) in the form 
P(t) = P(t — h)P(h). (6.4) 

But P(h) is near the identity when A is sufficiently small, and so P(h)"! 

[the inverse of P(h)] exists and also approaches the identity I. Therefore, 


P(t) = P® lim (P(h))' = lim P(t — h). (6.5) 


The limit relations (6.3) and (6.5) together show that P(t) is continuous. 
Actually, P(t) is not only continuous but also differentiable in that the 
limits 


i; 1 — Ph) _ 

ioe h ” 6 6 
_ Fh) a ©) 
lim a = Gijs iF j, 


exist, where 0 = g,, << © (i # j) and 0 = q, < ©. Starting with the relation 
N 


1-P,«h)= > Ph), 


j=0,j#i 


dividing by h, and letting h decrease to zero yields directly the relation 


qi = > qij- 
j=0,j#i 
The rates g, and q, furnish an infinitesimal description of the process 
with 
Pr{X(t +h) = |X) =i} =q,;h t+ o(h) fori #j, 
Pr{X(t + h) = X(t) = i} = 1 — gh + ofh). 


In contrast to the infinitesimal description, the sojourn description of 
the process proceeds as follows: Starting in state i, the process sojourns 
there for a duration that is exponentially distributed with parameter q,. 
The process then jumps to state j # i with probability p, = ;;/q;; the so- 
journ time in state j is exponentially distributed with parameter q,, and so 
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on. The sequence of states visited by the process, denoted by &, &,... , 
is a Markov chain with discrete parameter, called the embedded Markov 


chain. Conditioned on the state sequence &, &,, ... , the successive so- - 
journ times Sp, S,, ... are independent exponentially distributed random 
variables with parameters q,, q,,... , respectively. 


Assuming that (6.6) has been verified, we now derive an explicit ex- 
pression for P,(t) in terms of the infinitesimal matrix 


—Jo Fr °** on 
A= 4 ~ qq) qin 
Qno ni +++ ~4|n 


The limit relations (6.6) can be expressed concisely in matrix form: 
_ Ph) -!I 
lim ———— = 


h0+ h 


A, (6.7) 


which shows that A is the matrix derivative of P(t) at t = 0. Formally, 
A = P’(0). 
With the aid of (6.7) and referring to (6.2), we have 
Pt+h)—P@) PPA) -T Pw-—I 
h h h 


The limit on the right exists, and this leads to the matrix differential equa- 
tion 


P(?). (6.8) 


P(t) = PMA = AP, (6.9) 


where P(t) denotes the matrix whose elements are P;,(t) = dP,(t)/dt. The 
existence of P;,(t) is an obvious consequence of (6.7) and (6.8). The dif- 
ferential equations (6.9) under the initial condition P(O) = I can be solved 
by standard methods to yield the formula 


x A't" 
ni — 


Pit) =e“ =I+ (6.10) 


= 


Example The Two State Markov Chain Consider a Markov chain 
{X(t)} with states {0, 1} whose infinitesimal matrix is 
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le Sl 


The process alternates between states 0 and 1. The sojourn times in state 
0 are independent and exponentially distributed with parameter a. Those 
in state 1 are independent and exponentially distributed with parameter B. 
We carry out the matrix multiplication 


—Q a 


x 
B -B 


—@a a 


B —-B 


a+aB —-a&—- af 
-B-aB p+ af 


to see that AX = —(a + B)A. Repeated multiplication by A then yields 
An =[-(a + PIA, 


which when inserted into (6.10) simplifies the sum according to 


] [-(a@ + Bytl" 4 
P() = 1- —— YE 
) at 5, n! 
_ I _ 1 [e7 (ath _— 1JA 
+ B 
_ I 4 l A _ Ae (ar Pt, 
a+ B a+ B 
And with 7 = a/(a + B) and r= a + B, 
Pi) = — 7 OT ot 
= a-md-® : 


which is the matrix expression for equations (3.14a—d). 
Returning to the general Markov chain on states {0, 1,..., NV}, when 
the chain is irreducible (all states communicate) then P(t) > 0 for i, 
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j=0,1,..., Nand lim, P(t) = a, > 0 exists independently of the 
initial state i. The limiting distribution may be found by passing to the 
limit in (6.9), noting that lim,_,.. P’(t) = 0. The resulting equations for 
TT = (M%, 7)... 5 Wy) are 


~o Wo --: on 
dio ~GU +°* Qi 
0 = wA = (%, 7... 5 Ty) . . . ) 
no mi —|n 
which is the same as 
™7q,= >. 7q;, j=0,1,...,N. (6.11) 
iFj 
Equation (6.11) together with 
T+ 7+ +t m= 1 (6.12) 


determines the limiting distribution. 

Equation (6.11) has a mass balance interpretation that aids us in under- 
standing it. The left side 7g, represents the long run rate at which parti- 
cles executing the Markov process leave state j. This rate must equal the 
long run rate at which particles arrive at state j if equilibrium is to be 
maintained. Such arriving particles must come from some state i # j, and 
a particle moves from state i # j to state j at rate q,. Therefore, the night 
side %;; 77,q, represents the total rate of arriving particles. 


Example Industrial Mobility and the Peter Principle Let us suppose 
that a draftsman position at a large engineering firm can be occupied by a 
worker at any of three levels: JT = Trainee, J = Junior Draftsman, and 
S = Senior Draftsman. Let X(z) denote the level of the person in the posi- 
tion at time ¢, and suppose that X(t) evolves as a Markov chain whose in- 
finitesimal matrix is 


A= ajr a; Qys 
SU as 0 —a; 
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Thus a Trainee stays at that rank for an exponentially distributed time hav- 
ing parameter a; and then becomes a Junior Draftsman. A Junior Drafts- 
man stays at that level for an exponentially distributed length of time hav- 
ing parameter a, = dj, + djs. Then the Junior Draftsman leaves the 
position and 1s replaced by a Trainee with probability a,,;/a, or is promoted 
to a Senior Draftsman with probability a,;/a,, and so on. 

Alternatively, we may describe the model by specifying the movements 
during short time intervals according to 


Pr{X(t + h) = JIX() = T} = a;h + o(h), 
Pr{X(t + h) = T|X(t) = J} = a,rh + o(h), 
Pr{X(t + h) = S|X(t) = J} = a,sh + oth), 
Pr{X(t + h) = T|X(t) = S} = ash + o(h), 
and 
Pr{X(¢ +h) = 1X) =i} =1—ahto(h)  fori=T,J,S. 


The equations for the equilibrium distribution (7,, 7,, 77;) are, accord- 
ing to (6.11), 


a;T, = A);7 TT, + AsTs, 


a,1, — arT7, 


a;sT5 = Qj5T1), 
J = Ty + TT; + TT 5, 
and the solution is 
asa, 
T, = 


A;a, + Asa; + ArAys 


Asa r 
Asa, + Asa, + Aza’ 


7. = a7 Ayjs 
s — e 
a,a, + a,a; + a7; 
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Let us consider a numerical example for comparison with an alternative 
model to be developed later. We suppose that the mean times in the three 
states are 


State Mean Time 


T 0.1 
J 0.2 
S 1.0 


and that a Junior Draftsman leaves and 1s replaed by a Trainee with prob- 
ability ? and is promoted to a Senior Draftsman with probability 3. These 
suppositions lead to the prescription a; = 10, a,; = 2, a,; = 3, a; = 1. The 
equilibrium probabilities are 


™ = 75) +100) +103) 45 ot! 
10 
=— =0.22, 
™ "45 
30 
TT; —_ 45 — 0.67. 


But the duration that people spend in any given position is not expo- 
nentially distributed in general. A bimodal distribution is often observed 
in which many people leave rather quickly, while others persist for a sub- 
stantial time. A possible explanation for this phenomenon is found in the 
“Peter Principle,”? which asserts that a worker is promoted until finally 
reaching a position in which he or she is incompetent. When this happens, 
the worker stays in that job until retirement. Let us modify the industrial 
mobility model to accommodate the Peter Principle by considering two 
types of Junior Draftsmen, Competent and Incompetent. We suppose that 
a fraction p of Trainees are Competent, and g = 1 — p are Incompetent. 
We assume that a competent Junior Draftsman stays at that level for an ex- 


> Peter, Laurence J. and Raymond Hull, The Peter Principle, Buccaneer Books. 
Cutchogue, New York, 1969. 
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ponentially distributed duration with parameter a, and then is promoted to 
Senior Draftsman. Finally, an incompetent Junior Draftsman stays in that 
position until retirement, an exponentially distributed sojourn with para- 
meter a,, and then he or she is replaced by a Trainee. The relevant infini- 
tesimal matrix is given by 


T I Cc S 
T || ~4r War pa; 


Tj) & ~ a 

A = 
C il ac Ac 
S ‘As as 


The duration in the Junior Draftsman position now follows a probabil- 
ity law that is a mixture of exponential densities. To compare this model 
with the previous model, suppose that p = 3, g = i, a, = 2.86, anda, = 10. 
These numbers were chosen so as to make the mean duration as a Junior 
Draftsman, 


o(-- rn (| = (0.10) + (0.35) = 0.20, 


the same as in the previous calculations. The probability density of this 
duration is 


fit) = 210)e™"™ + 2(2.86)e"2**" ss for t=O. 


This density is plotted in Figure 6.1, next page, for comparison with the 
exponential density g(t) = Se~*', which has the same mean. The bimodal 
tendency is indicated in that f(t) > g(t) when t is near zero and when tf is 
very large. 

With the numbers as given and a; = 10 and a, = 1 as before, the sta- 
tionary distribution (77,, 77, 7, 75) 18 found by solving 


107; = 2.8677, 17, 
2.8677, = 477, 
107, = 677;, 
la; = 107, 


1 = 7,+ T+ Wot Ts. 
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Figure 6.1. The exponential density (straight line) versus the mixed expo- 
nential density (curved line). Both distributions have the same 
mean. A logarithmic scale was used to accentuate the differences. 


The solution is 
aq, = 0.111, a, = 0.155, 
a; = 0.667, 7- = 0.067. 


Let us make two observations before leaving this example. First, the 
limiting probabilities 77,, 77,, and a, = 7, + 7. agree between the two 
models. This is a common occurrence in stochastic modeling, wherein the 
limiting behavior of a process is rather insensitive to certain details of the 
model and depends only on the first moments, or means. When this hap- 
pens, the model assumptions can be chosen for their mathematical conve- 
nience with no loss. 

The second observation is specific to the Peter Principle. We have as- 
sumed that p = : of Trainees are competent Junior Draftsmen and only 
g = ; are Incompetent. Yet in the long run, a Junior Draftsman is found to 
be Incompetent with probability 77,/(a, + a) = 0.155/(0.155 + 0.067) = 
0.70! 
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Example Redundancy and the Burn-In Phenomenon An airlines 
reservation system has two computers, one on-line and one backup. The 
operating computer fails after an exponentially distributed duration hav- 
ing parameter yw and is replaced by the standby. There is one repair facil- 
ity, and repair times are exponentially distributed with parameter A. Let 
X(t) be the number of computers in operating condition at time t. Then 
X(t) is a Markov chain whose infinitesimal matrix is 


| 0 ] 2 

ON] -A A 0 
A= 1 \Ue —(At+p) A 

211 O\ po —p 


The stationary distribution (7, 77,, 77,) satisfies 


AT = LT, 

(A + p)7, = AM, + LT, 
LT, = AT, 

l= mt+ a+ , 


and the solution is 


1 
70 1+ (Mp) + Alp)?” 
ee 
1+ lm) + la)? 
(A/)” 
Ty 


~ 1+ (Alp) + (Alp)? 


The availability, or probability that at least one computer is operating, is 
l-— m= 7, + ™. 

Often in practice the assumption of exponentially distributed operating 
times is not realistic because of the so-called burn-in phenomenon. This 
idea is best explained in terms of the hazard rate r(t) associated with a 
probability density function f(t) of a nonnegative failure time T. Recall 
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that r(t) At measures the conditional probability that the item fails in the 
next time interval (t, tf + Ar) given that it has survived up to time ¢, and 
therefore we have 


fO 
1 — F(t) 
where F(t) is the cumulative distribution function associated with the 
probability density function f(t). 

A constant hazard rate r(t) = A for all t corresponds to the exponential 
density function f(t) = Ae~™“ for t = 0. The burn-in phenomenon is de- 
scribed by a hazard rate that is initially high and then decays to a constant 
level, where it persists, possibly later to rise again (aging). It corresponds 
to a situation in which a newly manufactured or newly repaired item has 
a significant probability of failing early in its use. If the item survives this 
test period, however, it then operates in an exponential or memoryless 
manner. The early failures might correspond to incorrect manufacture or 
faulty repair, or might be a property of the materials used. 

Anyone familiar with automobile repairs has experienced the burn-in 
phenomenon. 

One of many possible ways to model the burn-in phenomenon Is to use 
a mixture of exponential densities 


f(t) = pae™™' + gBe*, t=0, (6.13) 


r(t) = fort = 0, 


r(t) 


0 0.2 0.4 0.6 0.8 t 


Figure 6.2 The hazard rate corresponding to the density given in (6.13). The 
higher hazard rate at the initial ¢ values represents the burn-in 
phenomenon. 
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where 0 < p = 1 — gq <1 and a, B are positive. The density function for 
which p = 0.1, a = 10, g = 0.9, and B = 0.909 --- = 1/1.1 has mean one. 
Its hazard rate is plotted in Figure 6.2 on the previous page, where the 
_ higher initial burn-in level is evident. 

We may incorporate the burn-in phenomenon corresponding to the 
mixed exponential density (6.13) by expanding the state space according 
to the following table: 


Notation State 
Both computers\down 
l, One operating computer, current up time has parameter a 
l, One operating computer, current up time has parameter B 
24 Two operating computers, current up time has parameter @ 
2. Two operating computers, current up time has parameter B 


Equation (6.13) corresponds to a probability p that a computer begin- 
ning operation will have an exponentially distributed up time with para- 
meter @ and a probability g that the parameter is 8. Accordingly, we have 
the infinitesimal matrix 


0 l, l, 2, 2Q- 
OQ || —-A pa gn 
al) @ —(At+a) A 
A= 1,\| B —(A + B) A 
A pa qa — a 
B PB qB —B 


The stationary distribution can be determined in the usual way by ap- 
plying (6.11). 


Exercises 


6.1. A certain type component has two states: 0 = OFF and 1 = OP- 
ERATING. In state 0, the process remains there a random length of time, 
which is exponentially distributed with parameter a, and then moves to 
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state 1. The time in state 1 is exponentially distributed with parameter B, 
after which the process returns to state 0. 

The system has two of these components, A and B, with distinct para- 
meters: 


Component Operating Failure Rate —‘ Repair Rate 


A Bs As 
B By As 


In order for the system to operate, at least one of components A and B 
must be operating (a parallel system). Assume that the component sto- 
chastic processes are independent of one another. Determine the long run 
probability that the system is operating by 
(a) Considering each component separately as a two-state Markov 
chain and using their statistical independence; 
(b) Considering the system as a four-state Markov chain and solving 
equations (6.11). 


6.2. Let X,(t) and X,(t) be independent two-state Markov chains having 
the same infinitesimal matrix 


anf 4 


Argue that Z(t) = X,(t) + X,(t) is a Markov chain on the state space 
S = {0, 1, 2} and determine the transition probability matrix P(t) for Z(2). 


Problems 


6.1. Let Y,,n =0,1,..., bea discrete time Markov chain with transi- 
tion probabilities P = |P.], and let {N(t); t = 0} be an independent Pois- 
son process of rate A. Argue that the compound process 


X(t) = Ys t= 0, 


is a Markov chain in continuous time and determine its infinitesimal pa- 
rameters. 
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6.2. A certain type component has two states: 0 = OFF and 1 = 
OPERATING. In state 0, the process remains there a random length of 
time, which is exponentially distributed with parameter a, and then moves 
to state 1. The time in state 1 is exponentially distributed with parameter 
B, after which the process returns to state 0. 

The system has three of these components, A, B, and C, with distinct 
parameters: 


Component Operating Failure Rate Repair Rate 


A By As 
B Bs Qs 
C Be Ac 


In order for the system to operate, component A must be operating, and at 
least one of components B and C must be operating. In the long run, what 
fraction of time does the system operate? Assume that the component sto- 
chastic processes are independent of one another. 


6.3. Let X,(0), X,(t), ..., Xy(t) be independent two-state Markov chains 
having the same infinitesimal matrix 


0 1 
or ee 
Tl jp 


Determine the infinitesimal matrix for the Markov chain Z(t) = 
X,(t) tree + X y(t). 


6.4. A system consists of two units, both of which may operate simul- 
taneously, and a single repair facility. The probability that an operating 
system will fail in a short time interval of length Aris w(Ad) + o(Ad). Re- 
pair times are exponentially distributed, but the parameter depends on 
whether the failure was regular or severe. The fraction of regular failures 
is p, and the corresponding exponential parameter is a. The fraction of se- 
vere failure is g = 1 — p, and the exponential parameter is B < a. 
Model the system as a continuous time Markov chain by taking as 
States the pairs (x, y), where x = 0, 1, 2 is the number of units operating 
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and y = 0, 1, 2 is the number of units undergoing repair for a severe fail- 
ure. The possible states are (2, 0), (1, 0), (1, 1), (0, 0), (0, 1), and (0, 2). 
Specify the infinitesimal matrix A. Assume that the units enter the repair 
shop on a first come, first served basis. 


7. A Poisson Process with a Markov Intensity* 


Consider “points” scattered in some manner along the semi-infinite inter- 
val [0, %), and for an interval of the form J = (a, b], withO=a<b<o, 
let N(J) count the number of “points” in the interval J. Then N(J), as J 
ranges over the half-open intervals J = (a, b], is a point process. (See V, 
Section 5 for generalizations to higher dimensions.) Suppose that, condi- 
tional on a given intensity, M(J) is a nonhomogeneous Poisson process, 
but where the intensity function {A(Z), t = 0} 1s itself a stochastic process. 
Such point processes were introduced in V, Section 1.4, where they were 
called Cox processes in honor of their discoverer. While Cox processes 
are sufficiently general to describe a plethora of phenomena, they remain 
simple enough to permit explicit calculation, at least in some instances. As 
an illustration, we will derive the probability of no points in an interval for 
a Cox process in which the intensity function is a two-state Markov chain 
in continuous time. The Cox process alternates between being “ON” and 
“OFF.” When the underlying intensity is “ON,” points occur according to 
a Poisson process of constant intensity A. When the underlying process is 
“OFF,” no points occur. We will call this basic model a (0, A) Cox process 
to distinguish it from a later generalization. The (0, A) Cox process might 
describe bursts of rainfall in a locale that alternates between dry spells and 
wet ones, or arrivals to a queue from a supplier that randomly shuts down. 
Rather straightforward extensions of the techniques that we will now use 
in this simple case can be adapted to cover more complex models and 
computations, as we will subsequently show. 

We assume that the intensity process {A(Z); t = 0} is a two-state Markov 
chain in continuous time for which 


Pr{A(t + h) = AlA(t) = 0} = ah + oh), (7.1) 


*Starred sections contain material of a more specialized or advanced nature. 
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and 
Pr{A(t + h) = O|A() = A} = Bh + oh). (7.2) 


This intensity is merely the constant A times the two-state birth and death 
process of Section 3. As may be seen by allowing t > © in (3.13), such a 
process has the limiting distribution Pr{A(~) = 0} = B/(@ + B) and 
Pr{A(c) = A} = al/(a + B). We will assume that the intensity process be- 
gins with this limiting distribution, or explicitly, that Pr{A(0) = 0} = 
Bl(a + B) and Pr{A(0) =\A} = a/(a + £8). With this assumption, the in- 
tensity process is stationary in the sense that Pr{A(t) = 0} = B/(a + B) 
and Pr{A(t) = A} = ala +: _B) for all t = 0. This stationarity carries over 
to the Cox process N(J) to irhply that ar PMO, t}) = k} = Pr{N((s, s + #]) 
= k} for all s, t = 0 andk = 0, 1, . We are interested in determining 


F(t; A) = Pr{ MO, t}) = O}. 


A(t) = | A(s) ds (7.3) 
0 


and note the conditional Poisson probability 
Pr{N((0, t]) = O|A(s) for s = t} = e~™, 


so that upon removing the conditioning via the law of total probability we 
obtain 


f(t, A) = Ele] = fo) + fi, (7.4) 
where 
fot) = Pr{N((O, t]) = 0 and A(t) = O}, (7.5) 
and 
f,\(O) = Pr{N((O, t]) = 0 and A(t) = A}. (7.6) 


Using an infinitesimal “last step analysis” similar to that used to derive the 
Kolmogorov forward equations, we will derive a pair of first-order linear 
differential equations for f(t) and f,(t). To this end, by analyzing the pos- 
sibilities at time f and using the law of total probability, we begin with 
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f(t + h) = 
fot) Pr{N((t, t + h]) = O|AC) = 0} Pr{At + h) = O|A(D) = 0} 


+ f(t) Pr{N(t, t + h}) = OAC) = A} Pr{A(t + A) = O|ACt) = AY” 
= foD[1 — ah + o(h)] + fe "Bh + o(h), 
and | 


fit +h) = 
f(t) Pr{N((t, t + h]) = O|A(t) = A} Pr{A(t + A) = AIA) = A} 


+ f(t) Pr{N((t, t + h]) = O|A(t) = 0} Pr{A(t + Ah) = AIA = 0} 
= f(de™[1 — Bh + o(h)] + fi(thah + o(h). 
We rearrange the terms and use e~™” = 1 — Ah + o(h) to get 
f(t + h) — fil) = —af(h + BfMh + o(h) 
and 
fit +h) —fiO = —(B + AMA + af(Hh + o(h), 


which, after dividing by / and letting h tend to zero, become the differen- 
tial equations 


Dfolt) _ 


oo = aft) + Bf(O (1.7) 
and 
o = —-(B+ ADSM + af(t. (7.8) 
The initial conditions are 
f0) = Pr(A@) = 0} = Ba+B)  — (7.9) 
and 
f,(O) = Pr{A(O) = A} = al(a + B). (7.10) 


Such coupled first-order linear differential equations are readily solved. In 
our case, after carrying out the solution and simplifying the result, the an- 
swer is Pr{N((0, t]) = 0} = f(t) + f(D = f(t A), where 


f(t; A) = c, exp{—p.t} + c_ exp{—p_t} (7.11) 
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with 
we =HW(Atat B)+ VAta+t BY —4ad}, (7.12) 
Ms, ~ B- 
and 
2 a He loa + BI (7.14) 


bs ~~ M- 
A Generalization Let N be a Cox process driven by a 0-1 Markov 
chain A(t), but now suppose that when the intensity process is in state 0, 
the Cox process is, conditionally, a Poisson process of rate Ay, and when 
the intensity process is in state 1, then the Cox process is a Poisson 
process of rate A,. The earlier Cox process had Ay = 0 and A, = A. With- 
out loss of generality, we assume 0 < Ay < A,. 

In order to evaluate Pr{N((O, t]) = 0}, we write N as the sum N = 
N, + N, of two independent processes, where N, is a Poisson process of 
constant rate A, and N, is a (0, A) Cox process with A = A, — Ap. Then N 
is zero if and only if both N, and N, are zero, whence 


Pr{N((0, t]) = 0} = Pr{N,((O, t]) = 0}-Pr{N,((0, t]) = 0} 
= eo f(t, A, — Ap). (7.15) 


Example The tensile strength S(t) of a single fiber of length t is often 
assumed to follow a Weibull distribution of the form 


Pr{S(t) > x} = exp{—tox’}, for x > 0, (7.16) 


where 6 and o are positive material constants. The explicit appearance of 
the length ¢ in the exponent is an expression of a weakest-link size effect, 
in which the fiber strength is viewed as the minimum strength of inde- 
pendent sections. This theory suggests that the survivorship probability of 
strength for a fiber of length ¢ should satisfy the relation 


Pr{S(t) > x} = [Pr{S() > x}], > 0. (7.17) 


The Weibull distribution is the only type of distribution that 1s concen- 
trated on 0 = x < © and satisfies (7.17). 
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However, a fiber under stress may fail from a surface flaw such as a 
notch or scratch, or from an internal flaw such as a void or inclusion. 
Where the diameter d of the fiber varies along its length, the relative mag- 
nitude of these two types of flaws will also vary, since the surface of the 
fiber is proportional to d, while the volume is proportional to d?. As a sim- 
ple generalization, suppose that the two types of flaws alternate and that the 
changes from one flaw type to the other follow a two-state Markov chain 
along the continuous length of the fiber. Further, suppose that a fiber of 
constant type i flaw, for i = 0, 1, will support the load x with probability 


Pr{S(t) > x} = exp{—to;x*}, x>0, 


where o; and 6, are positive constants. 

We can evaluate the survivorship probability for the fiber having 
Markov varying flaw types by bringing in an appropriate (Aj, A,) Cox 
process. For a fixed x > 0, suppose that flaws that are weaker than x will 
occur along a fiber of constant flaw type i according to a Poisson process 
of rate A(x) = o;x*, for i = 0, 1. A fiber of length t and having Markov 
varying flaw types will carry a load of x if and only if there are no flaws 
weaker than x along the fiber. Accordingly, for the random flaw type fiber, 
using (7.15) we have 


Pr{S(t) > x} = 7°20" F(t; A(x) — Ag(x)). (7.18) 


Equation (7.18) may be evaluated numerically under a variety of as- 
sumptions for comparison with observed fiber tensile strengths. Where 
fibers having two flaw types are tested at several lengths, (7.18) may be 
used to extrapolate and predict strengths at lengths not measured. 

It is sometimes more meaningful to reparametrize according to 
a7 = ala + B) and t= a + B. Here 7 is the long run fraction of fiber 
length for which the applicable flaw distribution is of type 1, and 1 — 7 
is the similar fraction of type 0 flaw behavior. The second parameter 7 is 
a measure of the rapidity with which the flaw types alternate. In particu- 
lar, when 7 = 0, the diameter or flaw type remains in whichever state it 
began, and the survivor probability reduces to the mixture 


Pr{S(t) > x} = me7 6" + (1 — men 20, (7.19) 


On the other hand, at 7 = &, the flaw type process alternates instantly, and 
the survivor probability simplifies to 


Pr{S(t) > x} = exp{—¢t[7A,@) + (1 — m)A,(X)]}. (7.20) 
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The probability distribution for N((0, f]) Let A(z) be the cumula- 
tive intensity for a Cox process and suppose that we have evaluated 


g(t; 6) = Ele7 807], @) < Q < |. 
For a (0, A) Cox process, for instance, g(t, 0) = f(t; (1 — @)A), where f is 
defined in (7.11). Upon expanding as a power series in 9, according to 


x 


g(t; 0) = Ele* > 


k=0 


A(t 
k! a] 


\ x A(t) 
=\> Ele ® 10 
=0 k! 


= >) Pr{N((0, t]) = k} 6, 
k=0 


we see that the coefficient of 6* in the power series is Pr{N((0, t]) = k}. In 
principle then, the probability distribution for the points in an interval in 
a Cox process can be determined in any particular instance. 


Exercises 
7.1. Suppose that a (0, A) Cox process has a = B = 1 and A = 2. Show 
that uw. = 2+ V2, andc_ = (2 + V2) = 1 — c,, whence 


Pr{N((0, t]) = 0} = e-?"[cosh(V 21) + va sinh(V 29)]. 


7.2. Suppose that a (0, A) Cox process has a = B = 1 and A = 2. Show 
that 


1-v2 
+ ———_—_ 
4 


-(2-\2)1 en 24\2y 


1+ V2 
fol) = —Z—e 


and 
Ai) =_ 1e-(Q-\ 20 + 1e-(24\ 21 


satisfy the differential equations (7.7) and (7.8) with the initial conditions 
(7.9) and (7.10). 
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Problems 


7.1. Consider a stationary Cox process driven by a two-state Markov 
chain. Let 7 = a/(a + B) be the probability that the process begins in 
state A. 


(a) By using the transition probabilities given in (3.14a—d), show that 
Pr{A(t) = A} = wfor allt > 0. 
(b) Show that E[N((O, t])] = At for all t > 0. 


7.2. The excess life y(t) in a point process is the random length of the 
duration from time ¢ until the next event. Show that the cumulative distri- 
bution function for the excess life in a Cox process is given by 
Pr{y(t) S x} = 1 — Pr{N((t, t + x]) = O}. 


7.3. Let T be the time to the first event in a stationary (0, A) Cox 
process. Find the probability density function (¢) for T. Show that when 
a = B= 1 and A = 2, this density function simplifies to #(t) = exp{ —2r} 
cosh(V 21). 


7.4. Let T be the time to the first event in a stationary (0, A) Cox process. 
Find the expected value E[T]. Show that E[T] = }; when a = B = 1 and 
A = 2. What is the average duration between events in this process? 


7.5. Determine the conditional probability of no points in the interval 
(t, t + s], given that there are no points in the interval (0, ¢] for a station- 
ary Cox process driven by a two-state Markov chain. Establish the limit 


lim Pr{M((t, t + s]) = O|N(O, ]) =O} =e*, ss > 0. 


7.6. Show that the Laplace transform 


x 


d(s; A) = | e- "f(t; A) dt 
0 
is given by 
st1-mAt+T 


ON SE Get As + mA” 
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where T = a + Band 7 = a/(a + B). Evaluate the limit (a) as rT > ™, 
and (b) as T—> 0. 


7.7. Consider a (0, A) stationary Cox process with a = B = 1 and 
dh = 2. Show that g(t; 6) = f(t; (1 — 8)A) is given by 


I 
g(t; 0) = em" fcosh( Ri + R sinh( Ro, 


where 
R=V@ — 26+ 2. 
Use this to evaluate Pr{N((O, 1]) = 1}. 


7.8. Consider a stationary (0, A) Cox process. A long duration during 
which no events were observed would suggest that the intensity process 1s 
in state 0. Show that 


Pr{A(t) = O|N((O, #]) = 0} = Jolt) 


f(t)’ 
where f,(f) is defined in (7.5). 


7.9. Show that 


A) = aye! + ae 


and 
fit) = bye -#+' + b_e*' 
with 
a. = 41 - n)|I OTE | R =V(at+ B+ AY — 4ad 
_.f,,Are- P 
b. in| + R 


satisfy the differential equations (7.7) and (7.8) subject to the initial con- 
ditions (7.9) and (7.10). 
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7.10. Consider a stationary (0, A) Cox process. 
(a) Show that 
Pr{N((O, h]) > 0, N((A, h + tl) = 0} = f(t, A) — f(t + h; A), 
whence 


SA) — ft +h; 
Pr{N((h, h + t]) = O|N((O, h]) > 0} = FGA) ~ FE + hy A) 


1 — f(h; A) 
(b) Establish the limit 
lim Pr{N((h, h + t]) = OlN((0, h}) > 0} = On 
where 
df(t; r 
f'GA)= a. 


(c) We interpret the limit in (b) as the conditional probability 
Pr{N((O, t]) = 0|Event occurs at time 0}. 
Show that 
Pr{N((0, t]) = 0| Event at time 0} = p,e4+' + p_e 4, 
where 


_ Cys _ C_ pL 
P+ Cif, + Cp’ P- Cy py + Cp 
(d) Let tbe the time to the first event in (0, ©) in a stationary (0, A) Cox 
process with a = B = 1 and A = 2. Show that 
E[7|Event at time 0] = 1. 


Why does this differ from the result in Problem 7.4? 


7.11. A Stop-and-Go Traveler The velocity V(t) of a stop-and-go trav- 
eler is described by a two-state Markov chain. The successive durations in 
which the traveler is stopped are independent and exponentially distrib- 
uted with parameter a, and they alternate with independent exponentially 
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distributed sojourns, parameter B, during which the traveler moves at unit 
speed. Take the stationary case in which Pr{V(O) = 1} = 7 = al/(a + 8B). 
The distance traveled in time ¢ is the integral of the velocity: 


S(t) = | Vu) du. 
0 


Show that | 
E[e ] = f(t; 9), 6 real. 
(This is the Laplace transform of the probability density function of S(2).) 


7.12. ‘Let 7 be the time of the first event in a (0, A) Cox process. Let the 
0 and A states represent “OFF” and “ON,” respectively. 

(a) Show that the total duration in the (0, 7] interval that the system is 
ON is exponentially distributed with parameter A and does not de- 
pend on a, fB or the starting state. 

(b) Assume that the process begins 1n the OFF state. Show that the total 
duration in the (0, 7] interval that the system is OFF has the same 
distribution as 

NCS) 
ur 
k=0 
where ¢ is exponentially distributed with parameter A, M(t) is a Poisson 
process with parameter B, and 7, 1, ... are independent and exponen- 
tially distributed with parameter a. 


Chapter VII 
Renewal Phenomena 


1. Definition of a Renewal Process 
and Related Concepts 


Renewal theory began with the study of stochastic systems whose evolu- 
tion through time was interspersed with renewals or regeneration times 
when, in a Statistical sense, the process began anew. Today, the subject 1s 
viewed as the study of general functions of independent, identically dis- 
tributed, nonnegative random variables representing the successive inter- 
vals between renewals. The results are applicable in a wide variety of both 
theoretical and practical probability models. 

A renewal (counting) process {N(t), t= 0} is a nonnegative integer- 
valued stochastic process that registers the successive occurrences of an 
event during the time interval (0, t], where the times between consecutive 
events are positive, independent, identically distributed random variables. 
Let the successive durations between events be {X,}*_, (often representing 
the lifetimes of some units successively placed into service) such that X; 
is the elapsed time from the (i — 1)st event until the occurrence of the ith 
event. We write 


F(x) = Pr{X, = x}, k=1,2,3,..., 
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for the common probability distribution of X,, X,,....A basic stipulation 
for renewal processes is F(0) = 0, signifying that X,, X;, ... are positive 
random variables. We refer to 


W,=X,+X,+---+X,, = | 
n l 2 n n (1.1) 


(W, = 0 by convention), 


as the waiting time until the occurrence of the nth event. 
The relation between the interoccurrence times {X,} and the renewal 
counting process { M(t), t = 0} is depicted in Figure 1.1. Note formally that 


N(t) = number of indices n for which 0 < W, =t. (1.2) 
N(t) 
° | 
2 ————— 
| 
Sania" Nn no 
0 ! W, Ww; 


Figure 1.1. The relation between the interoccurrence times X, and the 
renewal counting process Nf). 


In common practice the counting process {N(t), t = 0} and the partial 
sum process {W,, n=O} are interchangeably called the “renewal 
process.” The prototypical renewal model involves successive replace- 
ments of light bulbs. A bulb is installed for service at time W, = 0, fails at 
time W, = X,, and is then exchanged for a fresh bulb. The second bulb 
fails at time W, = X, + X, and is replaced by a third bulb. In general, the 
nth bulb burns out at time W, = X, + --- +X, and is immediately re- 
placed, and the process continues. It 1s natural to assume that the succes- 
sive lifetimes are statistically independent, with probabilistically identical 
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characteristics in that 
Pr{X, = x} = F(x) fork=1,2,.... 


In this process M(t) records the number of light-bulb replacements up to 
time f. 

The principal objective of renewal theory is to derive properties of cer- 
tain random variables associated with {N(t)} and { W,} from knowledge of 
the interoccurrence distribution F. For example, it is of significance and 
relevance to compute the expected number of renewals for the time dura- 
tion (0, ¢]: 


( 


E[NO)] = M() 


is called the renewal function. To this end, several pertinent relationships 
and formulas are worth recording. In principle, the probability law of 
W, = X, +--+: + X, can be calculated in accordance with the convolution 
formula 


Pr{W, S x} = FQ), 


where F,(x) = F(x) is assumed known or prescribed, and then 


F(x) = | Fe y) dF) = [RG — y) dFO). 
0 0 


Such convolution formulas were reviewed in I, Section 2.5. 
The fundamental connecting link between the waiting time process 
{W,} and the renewal counting process {N(t)} 1s the observation that 


Nit)2k ifandonlyif West. (1.3) 


In words, equation (1.3) asserts that the number of renewals up to time f¢ 
is at least k if and only if the kth renewal occurred on or before time ¢. 
Since this equivalence is the basis for much that follows, the reader should 
verify instances of it by referring to Figure 1.1. 
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It follows from (1.3) that 
Pr{N(t) = k} = Pr{W, S 1} 
=F(), t20,k=1,2,..., 4 | 
and consequently, 
Pr{N(t) = k} = Pr{N(t) = k} — Pr{Nt) =k + 1} 
= F(t) — F,,,(0), t2=0,k=1,2,.... 


For the renewal function M(t) = E[N(t)] we sum the tail probabilities in 
the manner E[M(t)] = 2;., Pr{N(t) = k}, as derived in I, (5.2), and then 
use (1.4) to obtain 


(1.5) 


x 


M(t) = E[N()) = >. Pr{N(e) = k} 


=1 


=> Pr{W<t}=> FO. (1.6) 
= eI 


There are a number of other random variables of interest in renewal 
theory. Three of these are the excess life (also called the excess random 
variable), the current life (also called the age random variable), and the 
total life, defined, respectively, by 


Y, = Wane — t (excess or residual lifetime), 
6, = t— Ww (current life or age random variable), 
B,= y, + 6, (total life). 


A pictorial description of these random variables is given in Figure 1.2. 


N(t) 


Wo Wri t Wun +] t 


Figure 1.2 The excess life y,, the current life 5,, and the total life B,. 
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An important identity enables us to evaluate the mean of W,,,.,, in terms 
of the mean lifetime ~ = E[X,] of each unit and the renewal function 
M(t). Namely, it is true for every renewal process that 


E[Wayer] = EX, +++ + Xvi] 
= E[X,{EIN() + MU}, 
or 
E[ Wye] = oe (MC) + 1}. (1.7) 


At first glance, this identity resembles the formula given in II, (3.9) for the 
mean of arandom sum, which asserts that E[X, + --- + X,] = E[X,]E[N] 
when AN is an integer-valued random variable that is independent of X,, 
X,,.... The random sum approach does not apply in the current context, 
however, the crucial difference being that the random number of sum- 
mands M(t) + 1 is not independent of the summands themselves. Indeed, 
in Section 3, on the Poisson process viewed as a renewal process, we will 
show that the last summand X,,,,, has a mean that approaches twice the 
unconditional mean x = E[X,] for ¢ large. For this reason, it is not correct, 
in particular, that E[W,,,] can be evaluated as the product of E[X,] and 
E[N(2)]. In view of these comments, the identity expressed in equation 
(1.7) becomes more intriguing and remarkable. 

To derive (1.7), we will use the fundamental equivalence (1.3) in the 
form 


Nt) 2=j-1 ifandonlyif X,+---+X,_,St, 
which expressed in terms of indicator random variables becomes 
WN) 27-1} H=1U{X, +--+. + X_, Se}. 


Since this indicator random variable is a function only of the random vari- 
ables X,,..., X;_,, it is independent of X;, and thus we may evaluate 


E[XA{X, +--+ + X_, St}) = EX JEM{X, +--+ +X, Soh] 
= E[X] Pr{X,+--- +X, St} (1.8) 
= pF, (0). 


With (1.8) in hand, the evaluation of the equivalence expressed in (1.7) 
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becomes straightforward. We have 


ELWan+1] = E[X, +--+ + Xny+tl 
N(t)+ 1 
= E[X,] + E| > x, 
j=2 . 
=pt AY. XA{M(t) + 1 =i) 
jH=2 


=pt+)> FIXA{X, +--+ X10) 
j=2 


8 


=ptp> F(t) (using (1.8)) 
2 


j= 
= p[l + M(d)] (using (1.6)). 
Some examples of the use of the identity (1.7) will appear in the exercises, 


and an alternative proof in the case of a discrete renewal process can be 
found in Section 6. 


Exercises 


1.1. Verify the following equivalences for the age and the excess life in 
a renewal process M(t): 


y,>x ifandonlyif Mt+x) — Mt) = 0; 
and forO<x<t, 
6,>x ifandonlyif M(t) — Mt— x) =0. 
Why is the condition x < t important in the second case but not the first? 
1.2. Consider a renewal process in which the interoccurrence times 
have an exponential distribution with parameter A: 
f(x) = Ae“, and Fw) =1l-e™ for x > 0. 


Calculate F(t) by carrying out the appropriate convolution [see the 
equation just prior to (1.3)] and then determine Pr{N(t) = 1} from equa- 
tion (1.5). 
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1.3. Which of the following are true statements? 


(a) M(t) < kif and only if W, >t. 
(b) M(t) = kif and only if W, = ¢. 
(c) M(t) > kif and only if W, < t¢. 


1.4. Consider a renewal process for which the lifetimes X,, X,, ... are 
discrete random variables. having the Poisson distribution with mean A. 
That is, 


—A\n 


Pr{X, =n} = 


forn=0,1,.... 
n! 


(a) What is the distribution of the waiting time W,? 
(b) Determine Pr{N(t) = k}. 
Problems 


1.1. Verify the following equivalences for the age and the excess life in 
a renewal process M(t): (Assume t > x.) 


Pr{6, = x, y, > y} = Pr{N(t — x) = Mt + y)} 
a Pr(W.<t-—x,Wi,>tt y} 
F(t + y)] 


eT [1 — Fr + y — 2] de). 
k=1 9 


Carry out the evaluation when the interoccurrence times are exponentially 
distributed with parameter A, so that dF, is the gamma density 


k5k-1 


(k— 1)! 


dF,(z) = e * dz for z > 0. 


1.2. From equation (1.5), and for k = 1, verify that 
Pr{ M(t) = k} = Pr{(W, St < W,,} 


= [tu - Fe - 91 dF@), 
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and carry out the evaluation when the interoccurrence times are exponen- 
tially distributed with parameter A, so that dF, is the gamma density 
Nzko! 


dF,(z) = kp dz for z > 0. 


1.3. A fundamental identity involving the renewal function, valid for all 
renewal processes, 1s 


E[Wyw+i] = ELX IM + 1). 


See equation (1.7). Using this identity, show that the mean excess life can 
be evaluated in terms of the renewal function via the relation 


Ely,] = E[XJ + M@) — ¢. 


1.4. Let y, be the excess life and 6, the age in a renewal process having 
interoccurrence distribution function F(x). Determine the conditional 
probability Pr{ y, > y|65, = x} and the conditional mean E [y5, = x]. 


2. Some Examples of Renewal Processes 


Stochastic models often contain random times at which they, or some part 
of them, begin afresh in a statistical sense. These renewal instants form 
natural embedded renewal processes, and they are found in many diverse 
fields of applied probability including branching processes, insurance risk 
models, phenomena of population growth, evolutionary genetic mecha- 
nisms, engineering systems, and econometric structures. When a renewal 
process is discovered embedded within a model, the powerful results of 
renewal theory become available for deducing implications. 


2.1. Brief Sketches of Renewal Situations 
The synopses that follow suggest the wide scope and diverse contexts in 


which renewal processes arise. Several of the examples will be studied in 
more detail in later sections. 
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(a) Poisson Processes A Poisson process {N(t), t = 0} with parameter 
A is a renewal counting process having the exponential interoccurrence 
distribution 


F(x) =1-e™, x20, 


as established in V, Theorem 3.2. This particular renewal process pos- 
sesses a host of special features, highlighted later in Section 3. 


(b) Counter Processes The times between successive electrical im- 
pulses or signals impinging on a recording device (counter) are often as- 
sumed to form a renewal process. Most physically realizable counters 
lock for some duration immediately upon registering an impulse and will 
not record impulses arriving during this dead period. Impulses are 
recorded only when the counter is free (1.e., unlocked). Under quite rea- 
sonable assumptions, the sequence of events of the times of recorded im- 
pulses forms a renewal process, but it should be emphasized that the re- 
newal process of recorded impulses is a secondary renewal process 
derived from the original renewal process comprising the totality of all ar- 
riving impulses. 


(c) Traffic Flow The distances between successive cars on an indefi- 
nitely long single-lane highway are often assumed to form a renewal 
process. So also are the time durations between consecutive cars passing 
a fixed location. 


(d) Renewal Processes Associated with Queues In a single-server 
queueing process there are embedded many natural renewal processes. We 
cite two examples: 


(i) If customer arrival times form a renewal process, then the times of 
the starts of successive busy periods generate a second renewal 
process. 

(ii) For the situation in which the input process (the arrival pattern of 
customers) 1s Poisson, the successive moments in which the server 
passes from a busy to a free state determine a renewal process. 


(e) Inventory Systems In the analysis of most inventory processes it is 
customary to assume that the pattern of demands forms a renewal process. 
Most of the standard inventory policies induce renewal sequences, e.g., 
the times of replenishment of stock. 
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(f) Renewal Processes in Markov Chains Let Zp, Z,, ... be a recurrent 
Markov chain. Suppose Z, = i, and consider the durations (elapsed num- 
ber of steps) between successive visits to state i. Specifically, let W, = 0, 


W, = min{n > 0; Z, = i}, 
and 
W..., = min{n > W;; Z, = 1}, k=1,2,.... 


Since each of these times 1s computed from the same starting state i, the 
-Markov property guarantees that X, = W, — W,_, are independent and 
identically distributed, and thus {X,} generates a renewal process. 


2.2. Block Replacement 


Consider a light bulb whose life, measured in discrete units, 1s a random 
variable X, where Pr{X = k} = p, for k = 1, 2, .... Assuming that one 
starts with a fresh bulb and that each bulb is replaced by a new one when 
it burns out, let M(n) = E[N(n)] be the expected number of replacements 
up to time n. 

Because of economies of scale, in a large building such as a factory or 
office it is often cheaper, on a per bulb basis, to replace all the bulbs, failed 
or not, than it is to replace a single bulb. A block replacement policy at- 
tempts to take advantage of this reduced cost by fixing a block period K 
and then replacing bulbs as they fail during periods 1, 2,..., K — 1, and 
replacing all bulbs, failed or not, in period K. This strategy is also known 
as “group relamping.” If c, is the per bulb block replacement cost and 
c, 1s the per bulb failure replacement cost (c, < c,), then the mean total 
cost during the block replacement cycle is c, + c,M(K — 1), where 
M(K — 1) = E[MK — 1)] is the mean number of failure replacements. 
Since the block replacement cycle consists of K periods, the mean total 
cost per bulb per unit time is 


c, + c,M(K — 1) 
rr 


If we can determine the renewal function M(n) from the life distribution 
{p,}, then we can choose the block period K = K* so as to minimize the 
cost rate &K). Of course, this cost must be compared to the cost of re- 
placing only upon failure. 

The renewal function M(n), or expected number of replacements up to 


O(K) = 
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time n, solves the equation 
n—-1 


M(n) = F(n)+ > ppMn-k  forn=1,2,.... 
k=1 


To derive this equation, condition on the life X, of the first bulb. If it fails 
after time n, there are no replacements during periods [1, 2,..., n]. On 
the other hand, if it fails at time k <n, then we have its failure plus, on 
the average, M(n — k) additional replacements during the interval [k + 1, 
k+2,...,n]. Using the law of total probability to sum these contribu- 
tions, we obtain 


Mn)= > p,(0)+ > pll+Min- bl 
k=1 


k=n+1 


n—-l 


= F,(n) + > p.M(n — k) [because M(0) = OQ], 
k=1 


as asserted. 
Thus we determine 
M(1) = F(1), 
M(2) = F,(2) + p,M(1), 
M(3) = F,(3) + p,M(2) + p.M(1), 
and so on. 


To consider a numerical example, suppose that 
p,=0.1, p,=0.4, p,;=0.3, and p, = 0.2, 
and 
c,;=2 and c,=3. 
Then 
M(1) = p, = 0.1, 
M(2) = (p, + p2) + p,M(1) = (0.1 + 0.4) + 0.1(0.1) = 0.51, 
M(3) = (p, + p2 + ps) + pM(2) + pM) 
= (0.1 + 0.4 + 0.3) + 0.1(0.51) + 0.4(0.1) = 0.891, 
M(4) = (p, + p, + ps + ps) + PM(3) + p.M(2) + pM) 
= 1 + 0.1(0.891) + 0.4(0.51) + 0.3(0.1) = 1.3231. 
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The average costs are shown in the following table: 


+ »M K —_ 
Block Period Cost = oF MAA) = 0K) 

K K 

1 2.00000 
2 1.15000 
3 1.17667 
4 1.16825 
5 1.19386 


The minimum cost block period is K* = 2. 
We wish to elicit one more insight from this example. Forgetting about 
block replacement, we continue to calculate 


M(5) = 1.6617, 
M(6) = 2.0647, 
M(7) = 2.4463, 
M(8) = 2.8336, 
M(9) = 3.2136, 


M(10) = 3.6016. 


Let u,, be the probability that a replacement occurs in period n. Then M(n) 
= M(n — 1) + u, asserts that the mean replacements up to time n is the 
mean replacements up to time n — | plus the probability that a replacement 
occurs in period n. The calculations are shown in the following table: 


= 


u, = M(n) — M(n — 1) 


0.1000 
0.4100 
0.3810 
0.4321 
0.3386 
0.4030 
0.3816 
0.3873 
0.3800 
0.3880 


COwowaonrnnanh WN = 


— 
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The probability of a replacement in period n seems to be converging. 
This is indeed the case, and the limit is the reciprocal of the mean bulb 
lifetime: 


1 1 
E[X,] _ 0.1(1) + 0.4(2) + 0.3(3) + 0.2(4) 


= 0.3846 +++. 


This calculation makes sense. If a light bulb lasts, on the average, E[X,] 
time units, then the probability that it will need to be replaced in any pe- 
riod should approximate 1/E[X,]. Actually, the relationship is not as sim- 
ple as just stated. Further discussion takes place in Sections 4 and 6. 


Exercises 


2.1. Let {X,;n =0, 1,...} be a two-state Markov chain with the tran- 
sition probability matrix 
0 ] 
a ae 
lll b 1-—b 
State 0 represents an operating state of some system, while state 1 repre- 
sents a repair state. We assume that the process begins in state X, = 0, 
and then the successive returns to state 0 from the repair state form a 


renewal process. Determine the mean duration of one of these renewal 
intervals. 


2.2. A certain type component has two states: 0 = OFF and 
1 = OPERATING. In state 0, the process remains there a random length 
of time, which is exponentially distributed with parameter a, and then 
moves to state 1. The time in state 1 is exponentially distributed with pa- 
rameter B, after which the process returns to state 0. 

The system has two of these components, A and B, with distinct 
parameters: 


Component Operating Failure Rate Repair Rate 
A Ba An 


B Bs ap 
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In order for the system to operate, at least one of components A and B 
must be operating (a parallel system). Assume that the component sto- 
chastic processes are independent of one another. Consider the successive 
instants that the system enters the failed state from an operating state. Use 
the memoryless property of the exponential distribution to argue that these 
instants form a renewal process. 


2.3. Calculate the mean number of renewals M(n) = E[N(n)] for the re- 
newal process having interoccurrence distribution 

p,=04, p,=0.1, p,=0.3, p,=0.2 
forn = 1,2,..., 10. Also calculate u, = M(n) — M(n — 1). 


Problems 


2.1. For the block replacement example of this section for which 
p, = 0.1, p, = 0.4, p; = 0.3, and p, = 0.2, suppose the costs are c, = 4 
and c, = 5. Determine the minimal cost block period K* and the cost of 
replacing upon failure alone. 


2.2. Let X,, X,,... be the interoccurrence times in a renewal process. 
Suppose Pr{X, = 1} = p and Pr{X, = 2} = q = 1 — p. Verify that 
a a 


M(n) = E[N(n)] = on LL (- 9") 


l+q —( + gy 
forn = 2,4, 6,.... 
2.3. Determine M(n) when the interoccurrence times have the geo- 
metric distribution 
Pr{X, = k} = p, = BU — By" fork =1,2,..., 
where 0 < B< 1. 


3. The Poisson Process Viewed as a Renewal Process 


As mentioned earlier, the Poisson process with parameter A is a renewal 
process whose interoccurrence times have the exponential distribution 
F(x) = 1 — e-*, x = 0. The memoryless property of the exponential dis- 
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tribution (see Sections 4.2, 5.2 of I, and V) serves decisively in yielding 
the explicit computation of a number of functionals of the Poisson re- 
newal process. 


The Renewal Function Since M(t) has a Poisson distribution, then 


Atye~*! 
PriN() = =4 a .  k=0,1,.-., 


and 
M(t) = E[N(t)] = At. 


Excess Life Observe that the excess life at time t exceeds x if and only 
if there are no renewals in the interval (t, t + x] (Figure 3.1). This event 
has the same probability as that of no renewals in the interval (0, x], since 
a Poisson process has stationary independent increments. In formal terms, 
we have 


Pr{y, > x} = Pr{N(t + x) — M(t) = 0} 
= Pr{N(x) = O} =e™. 


Thus, in a Poisson process, the excess life possesses the same exponential 
distribution 


(3.1) 


Pr{y, =x} =1-e™, x20, (3.2) 


as every life, another manifestation of the memoryless property of the ex- 
ponential distribution. 


Nt) 


t tt+x 


Figure 3.1 The excess life y, exceeds x if and only if there are no renewals 
in the interval (t, tf + x]. 
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Current Life The current life 6,, of course, cannot exceed t, while for 
x <t the current life exceeds x if and only if there are no renewals in 
(t — x, t], which again has probability e~*. Thus the current life follows 
the truncated exponential distribution 


‘ —e* forOSx<t, 33 
] fort =x. (3.3) 


Mean Total Life Using the evaluation of I, (5.3) for the mean of a non- 
* negative random variable, we have 


E(B) = Ely) + F[6] 


+ | Pr(3,> x} de 
0 


t 
+ [eo de 
0 


= 1 + i ] — go A 
r r! e"'), 

Observe that the mean total life is significantly larger than the mean life 
1/A = E[X,] of any particular renewal interval. A more striking expression 
of this phenomenon is revealed when f¢ is large, where the process has 
been in operation for a long duration. Then the mean total life E[B,] is ap- 
proximately twice the mean life. These facts appear at first paradoxical. 

Let us reexamine the definition of the total life 8, with a view to ex- 
plaining on an intuitive basis the seeming discrepancy. First, an arbitrary 
time point ¢ is fixed. Then 8, measures the length of the renewal interval 
containing the point ¢. Such a procedure will tend with higher likelihood 
to favor a lengthy renewal interval rather than one of short duration. The 
phenomenon is known as length-biased sampling and occurs, well dis- 
guised, in a number of sampling situations. 


Joint Distribution of y, and 6, The joint distribution of y, and 6, is 
determined in the same manner as the marginals. In fact, for any x > 0 and 
O<y<t, the event {y, > x, 6, > y} occurs if and only if there are no re- 
newals in the interval (t — y, t + x], which has probability e~*“*”. Thus 
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em fx >0,0<y<t, 


Pry. > x8, >y) = |e ify=t. 


(3.4) 


For the Poisson process, observe that yy, and 6, are independent, since their 
joint distribution factors as the product of their marginal distributions. 


Exercises 


3.1. Let W, W,,...be the event times in a Poisson process {X(t); t = 0} 
of rate A. Evaluate 


Pr{Wyye1 et +s} and ELWayail. 


3.2. Particles arrive at a counter according to a Poisson process of rate 
A. An atriving particle is recorded with probability p and lost with proba- 
bility 1 — p independently of the other particles. Show that the sequence 
of recorded particles is a Poisson process of rate Ap. 


3.3. Let W, W,,...be the event times in a Poisson process {N(t); t = 0} 
of rate A. Show that 


N(t) and Wyaysi 
are independent random variables by evaluating 


Pr{Mt)=n and Ways, >t +s}. 


Problems 


3.1. In another form of sum quota sampling (see V, Section 4.2), a se- 
quence of nonnegative independent and identically distributed random 
variables X,, X,, ... 1S observed, the sampling continuing until the first 
time that the sum of the observations exceeds the quota t. In renewal 
process terminology, the sample size is N(t) + 1. The sample mean is 


Ware _ X, tees Xnw+i 
N(t) + 1 Nt) + 1 


An important question in statistical theory is whether or not this sample 
mean is unbiased. That is, how does the expected value of this sample 
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mean relate to the expected value of, say, X,? Assume that the individual] 
X summands are exponentially distributed with parameter A, so that N(f) 
is a Poisson process, and evaluate the expected value of the foregoing 
sample mean and show that 


ree | a 2 _ oWA i 
Ey yl =e (1+) 


Hint: Use the result of the previous exercise, that 
Wrest and N(t) 


are independent, and then evaluate separately 


] 
E(Way+i1] and Aaoail 


3.2. A fundamental identity involving the renewal function, valid for all 
renewal processes, 1S 


EL Wawy+i] = E[X, (M(t) + 1). 


See equation (1.7). Evaluate the left side and verify the identity when the 
renewal counting process is a Poisson process. 


3.3. Pulses arrive at a counter according to a Poisson process of rate A. 
All physically realizable counters are imperfect, incapable of detecting all 
signals that enter their detection chambers. After a particle or signal ar- 
rives, a counter must recuperate, or renew itself, in preparation for the 
next arrival. Signals arriving during the readjustment period, called dead 
time or locked time, are lost. We must distinguish between the arriving 
particles and the recorded particles. The experimenter observes only the 
particles recorded; from this observation he desires to infer the properties 
of the arrival process. 

Suppose that each arriving pulse locks the counter for a fixed time T. 
Determine the probability p(t) that the counter is free at time f. 


3.4. This problem is designed to aid in the understanding of length- 
biased sampling. Let X be a uniformly distributed random variable on 
[O, 1]. Then X divides [0, 1] into the subintervals [0, X] and (X, 1]. By 
symmetry, each subinterval has mean length ;. Now pick one of these 
subintervals at random in the following way: Let Y be independent of 
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X and uniformly distributed on [0, 1], and pick the subinterval [0, X] or 
(X, 1] that Y falls in. Let L be the length of the subinterval so chosen. 
Formally, 


L={" ifY=X, 
~~ xX) Of Y> X. 


Determine the mean of L. 


3.5. Birds are perched along a wire as shown according to a Poisson 
process of rate A per unit distance: 


0 — 


At a fixed point t¢ along the wire, let D(t) be the random distance to the 
nearest bird. What is the mean value of D(t)? What is the probability den- 
sity function f(x) for D(t)? 


4. The Asymptotic Behavior of Renewal Processes 


A large number of the functionals that have explicit expressions for 
Poisson renewal processes are far more difficult to compute for other re- 
newal processes. There are, however, many simple formulas that describe 
the asymptotic behavior, for large values of t, of a general renewal 
process. We summarize some of these asymptotic results in this section. 


4.1. The Elementary Renewal Theorem 


The Poisson process is the only renewal process (in continuous time) 

whose renewal function M(t) = E[N(t)] is exactly linear. All renewal 
functions are asymptotically linear, however, in the sense that 

Mt)... E{(N)] _ 1 

= n—_ = — 


lim— = li 
Ie t [3x t pL 


(4.1) 


9 
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where yp = E[X,] is the mean interoccurrence time. This fundamental 
result, known as the elementary renewal theorem, is invoked repeatedly 
to compute functionals describing the long run behavior of stochastic 
models having renewal processes associated with them. 

The elementary renewal theorem (4.1) holds even when the interoccur- 
rence times have infinite mean, and then lim,_,,, M(p/t = 1/oo = 0. 

The elementary renewal theorem is so intuitively plausible that it has 
often been viewed as obvious. The left side, lim,_,.. M(t)/t, describes the 
long run mean number of renewals or replacements per unit time. The 
- right side, 1/j, is the reciprocal of the mean life of a component. Isn’t it 
obvious that if a component lasts, on the average, yz time units, then in the 
long run these components will be replaced at the rate of 1/w per unit 
time? However plausible and convincing this argument may be, it is not 
obvious, and to establish the elementary renewal theorem requires several 
steps of mathematical analysis, beginning with the law of large numbers. 
As our main concern is stochastic modeling, we omit this derivation, as 
well as the derivations of the other asymptotic results summarized in this 
section, in order to give more space to their application. 


Example Age Replacement Policies Let X,, X,, ... represent the 
lifetimes of items (light bulbs, transistor cards, machines, etc.) that are 
successively placed in service, the next item commencing service imme- 
diately following the failure of the previous one. We stipulate that {X,} are 
independent and identically distributed positive random variables with fi- 
nite mean uw = E[X,]. The elementary renewal theorem tells us to expect 
to replace items over the long run at a mean rate of 1/y per unit time. 

In the long run, any replacement strategy that substitutes items prior to 
their failure will use more than 1/y items per unit time. Nonetheless, 
where there is some benefit in avoiding failure in service, and where units 
deteriorate, in some sense, with age, there may be an economic or relia- 
bility advantage in considering alternative replacement strategies. Tele- 
phone or utility poles serve as good illustrations of this concept. Clearly, 
it is disadvantageous to allow these poles to fail in service because of the 
damage to the wires they carry, the damage to adjoining property, over- 
time wages paid for emergency replacements, and revenue lost while ser- 
vice is down. Therefore, an attempt is usually made to replace older util- 
ity poles before they fail. Other instances of planned replacement occur in 
preventative maintenance strategies for aircraft, where “time” is now 
measured by operating hours. 
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An age replacement policy calls for replacing an item upon its failure 
or upon its reaching age T, whichever occurs first. Arguing intuitively, we 
would expect that the long run fraction of failure replacements, items that 
fail before age T, will be F(T), and the corresponding fraction of (con- 
ceivably less expensive) planned replacements will be 1 — F(T). A re- 
newal interval for this modified age replacement policy obviously follows 
a distribution law 


F(x) = fae for x < T, 
re) ] forx = T, 


and the mean renewal duration 1s 


' 
KH 


T 
pr={l-FW@}dr=[ (1 - FO) ae <p. 
0 O 


The elementary renewal theorem indicates that the long run mean re- 
placement rate under age replacement is increased to 1/p;. 

Now, let Y,, Y,, . . . denote the times between actual successive failures. 
The random variable Y, is composed of a random number of time periods 
of length T (corresponding to replacements not associated with failures), 
plus a last time period in which the distribution is that of a failure condi- 
tioned on failure before age 7; that is, Y, has the distribution of NT + Z, 
where 


Pr{N = k} = {1 — F(T)}, k=0,1,..., 
and 


_ FF) 


Pr{Z = z} FT)’ 


Hence, 


E[Y,] = air — F(T)] + | (F(T) — F(x)) a| 


F(T) ‘ 
l f be 
~ FD) } {1 — F(x)} ax = Try: 


The sequence of random variables for interoccurrence times of the bona 
fide failure {Y,} generates a renewal process whose mean rate of failures 
per unit time in the long run is 1/E[Y,]. This inference again relies on the 
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elementary renewal theorem. Depending on F, the modified failure rate 
1/E[Y,] may possibly yield a lower failure rate than 1/y, the rate when re- 
placements are made only upon failure. 

Let us suppose that each replacement, whether planned or not, costs K 
dollars, and that each failure incurs an additional penalty of c dollars. 
Multiplying these costs by the appropriate rates gives the long run mean 
cost per unit time as a function of the replacement age T: 

K Cc 


C(T) = — +—— 
” ur =6EfY,] 


__ K+ cK(T) 
soll — F@)] dx 


In any particular situation, a routine calculus exercise or recourse to nu- 
merical computation produces the value of T that minimizes the long run 
cost rate. For example, if K = 1, c = 4, and lifetimes are uniformly dis- 
tributed on [0, 1], then F(x) = x forO = x = 1, and 


for ~ reoyac=r (1-32) 


and 


1+ 4T 
OO = Fa = TID: 
To obtain the cost minimizing T, we differentiate C(T) with respect to T 
and equate to zero, thereby obtaining 
dC(T) =o= 4T01 — T/2) -(1 + 47T)(1 -— T) 
aT [T — T/2)P 


0=47T-2T’?’-1+T7-4T+4T’, 
0=2T°+T-—1, 
—-1+V1+8 (- 


T = ————_=[-,-1 


4 2 


and the optimal choice is T* = 5. Routine calculus will verify that this 
choice leads to a minimum cost, and not a maximum or inflection point. 
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4.2. The Renewal Theorem for Continuous Lifetimes 


The elementary renewal theorem asserts that 


It is tempting to conclude from this that M(t) behaves like t/y as t grows 
large, but the precise meaning. of the phrase “behaves like” is rather sub- 
tle. For example, suppose thatiall of the lifetimes are deterministic, say 


X, = 1 fork =1,2,....Then it is straightforward to calculate 
| M(t) = N(t) = 0 forO =t< 1, 
= ] for] =t< 2, 
=k fork St<k+1. 


That is, M(t) = [t], where [t] denotes the greatest integer not exceeding t. 
Since = 1 in this example, then M(t) — t/u = [t] — t, a function that os- 
cillates indefinitely between 0 and —1. While it remains true in this illus- 
tration that M(1)/t = [t]/t-1 = 1/p, it is not clear in what sense M(t) “be- 
haves like” t/w. If we rule out the periodic behavior that 1s exemplified in 
the extreme by this deterministic example, then M(t) behaves like t/w in 
the sense described by the renewal theorem, which we now explain. Let 
M(t, t + h] = Mct + h) — M(t) denote the mean number of renewals in 
the interval (t, t + h]. The renewal theorem asserts that when periodic be- 
havior 1s precluded, then 


lin M(t, t + h] = h/w for any fixed h > 0. (4.2) 


In words, asymptotically, the mean number of renewals in an interval is 
proportional to the interval’s length, with proportionality constant 1/p. 

A simple and prevalent situation in which the renewal theorem (4.2) is 
valid occurs when the lifetimes X,, X,, .. . are continuous random vari- 
ables having the probability density function f(x). In this circumstance, 
the renewal function is differentiable, and 


- ae) -F £0 (4,3) 


n=] 


where f(t) is the probability density function for W, = X,+--: + X,. 
Now (4.2) may be written in the form 
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M(t + h) — M(t) l 
aN Ly 


t ; 
1 as t > 9% 
which, when h 1s small, suggests that 
dM(t) 1 
lim m(t) = lim =—, 
im m(t) = lim Th au (4.4) 
and indeed, this is the case in all but the most pathological of circum- 
stances when X,, X,, ... are continuous random variables. 
If in addition to being continuous, the lifetimes X,, X,, .. . have a finite 


mean yp and finite variance a’, then the renewal theorem can be refined to 
include a second term. Under the stated conditions, we have 


. t or wt 
lim Mc — +| = ———_. (4.5) 
aa p. 2p 
Example When the lifetimes X,, X,, ... have the gamma density 
function 
f(x) = xe™* for x > 0, (4.6) 


then the waiting times W, = X, + --- + X, have the gamma density 


cae _ 
FAX) = Qn- Di for x > 0, 
as may be verified by performing the appropriate convolutions. (See I, 
Section 2.5.) Substitution into (4.3) yields 


2n-1 


S ee 
2, £0) 7 & 2, (2n — 1)! 


m(x) 


e* — e* 
2 


l 
= p~* _- — ] — g~2x , 
e 5 | e™) 


and 


2 


Since the gamma density in (4.6) has moments uw = 2 and a’ = 2, we 


1 41 
M(t) = [ moo dx =—t- ql —e*]. 
0) 
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verify that m(t)1/u as t>% and M(t) — tu>—-5 = (0? — p’)/2p’, in 
agreement with (4.4) and (4.5). 


4.3. The Asymptotic Distribution of N(f) 


The elementary renewal theorem 


_ EIN) _ 1 
liim-—— = — 


(4.7) 
[—>% f U 


implies that the asymptotic mean of M(t) is approximately t/u. When 
p= E[X,]'and o* = Var[X,] = E[(X, — )’] are finite, then the asymp- 
totic variance of M(t) behaves according to 


_ Var[M()] oo 
lim ———— = —— 


; 4.8 
im = (4.8) 


That is, the asymptotic variance of M(t) is approximately ta’/u’. If we 
standardize M(t) by subtracting its asymptotic mean and dividing by its 
asymptotic standard deviation, we get the following convergence to the 
normal distribution: 


ee as “? dy. 


1 x 
im pfMOat Jaf 
me Ll Vio J V2 J 
In words, for large values of t the number of renewals M(t) is approxi- 


mately normally distributed with mean and variance given by (4.7) and 
(4.8), respectively. 


4.4. The Limiting Distribution of Age and Excess Life 


Again we assume that the lifetimes X,, X,, ... are continuous random 
variables with finite mean p. Let y, = W,,,.; — t be the excess life at time 
t. The excess life has the limiting distribution 


lim Pr{y, <x) = -| [1 - FO] ay. (4.9) 


1% 
0) 
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The reader should verify that the right side of (4.9) defines a valid distrj- 
bution function, which we denote by H(x). The corresponding probability 
density function is h(y) = wo'[1 — F(y)]. The mean of this limiting dis- 


tribution is determined according to 


co) 1 oO 
[ yn) dy = — | yt — F(y)] dy 
0 Mo 


1 oC x 
= — | | ft) al dy 
B Y 


0] 
= - j fo} 5] at 


1 fo a} 
= — | find 
ral f(t) dt 


ote 
2m 
where a” is the common variance of the lifetimes X,, X,,.... 
{y,=xand6,=y} ifandonlyif {y,-,=x + y}. 


It follows that 
lim Pr{y, = x, 6, = y} = limPr{y,_, =x + y} 


=p" [ U-F@)d, 


xty 


exhibiting the joint limiting distribution of (y,, 6,). In particular, 
lim Pr{6, = y} = limPr{y, = 0, 6, = y} 
= w' {0 - FOI ae 


y 


= 1— Hy). 


(4.10) 
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The limiting distribution for the current life, or age, 6, = t — W,,, can 
be deduced from the corresponding result (4.9) for the excess life. With 
the aid of Figure 4.1, corroborate the equivalence 


N(t) 


Figure 4.1 {6,= y and y, = x} if and only if {y,_, = x + y}. 


Exercises 


4.1. Consider the triangular lifetime density f(x) = 2x for O<x< 1. 
Determine an asymptotic expression for the expected number of renewals 
up to time f¢. 


Hint: Use equation (4.5). 


4.2. Consider the triangular lifetime density f(x) = 2x for0<x< 1. 
Determine an asymptotic expression for the probability distribution of ex- 
cess life. Using this distribution, determine the limiting mean excess life 
and compare with the general result following equation (4.9). 


4.3. Consider the triangular lifetime density function f(x) = 2x, for 
0 < x < 1. Determine the optimal replacement age in an age replacement 
model with replacement cost K = 1 and failure penalty c = 4 (cf. the ex- 
ample in Section 4.1). 
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4.4. Show that the optimal age replacement policy is to replace upon 
failure alone when lifetimes are exponentially distributed with parameter 
A. Can you provide an intuitive explanation? 


4.5. What is the limiting distribution of excess life when renewal life- 
times have the uniform density f(x) = 1, forO<x< 1? 


4.6. A machine can be in either of two states: “up” or “down.” It is up 
at time zero and thereafter alternates between being up and down. The 
lengths X,, X,, . . . of successive up times are independent and identically 
distributed random variables with mean a, and the lengths Y¥,, ¥,,... of 
successive down times are independent and identically distributed with 
mean PB. 


(a) In the long run, what fraction of time is the machine up? 

(b) If the machine earns income at a rate of $13 per unit time while up, 
what is the long run total rate of income earned by the machine? 

(c) If each down time costs $7, regardless of how long the machine is 
down, what is the long run total down time cost per unit time? 


Problems 


4.1. Suppose that a renewal function has the form M() =t+ 
[1 — exp(—at)]. Determine the mean and variance of the interoccurrence 
distribution. 


4.2. A system is subject to failures. Each failure requires a repair time 
that is exponentially distributed with rate parameter a. The operating time 
of the system until the next failure is exponentially distributed with rate 
parameter £. The repair times and the operating times are all statistically 
independent. Suppose that the system is operating at time 0. Using equa- 
tion (4.5), determine an approximate expression for the mean number of 
failures up to time ¢, the approximation holding for t > 0. 


4.3. Suppose that the life of a light bulb is a random variable X with 
hazard rate h(x) = @x for x > 0. Each failed light bulb is immediately re- 
placed with a new one. Determine an asymptotic expression for the mean 
age of the light bulb in service at time ¢, valid for t >> 0. 
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4.4. Adeveloping country is attempting to control its population growth 
by placing restrictions on the number of children each family can have. 
This society places a high premium on female children, and it is felt that 
any policy that ignores the desire to have female children will fail. The 
proposed policy is to allow any married couple to have children up to the 
first female baby, at which point they must cease having children. Assume 
that male and female children are equally likely. The number of children 
in any family is a random variable N. In the population as a whole, what 
fraction of children are female? Use the elementary renewal theorem to 
justify your answer. 


4.5. A Markov chain X), X,, X>,... has the transition probability matrix 


O 1 2 

01103 07 0 
P=1//06 0 0.4]. 

21) 0 O05 O05 


A sojourn in a state is an uninterrupted sequence of consecutive visits to 
that state. 


(a) Determine the mean duration of a typical sojourn in state 0. 
(b) Using renewal theory, determine the long run fraction of time that 
the process is in state 1. 


5. Generalizations and Variations 
on Renewal Processes 


5.1. Delayed Renewal Processes 


We continue to assume that {X,} are all independent positive random vari- 
ables, but only X,, X;,.. . (from the second on) are identically distributed 
with distribution function F, while X, has possibly a different distribution 
function G. Such a process is called a delayed renewal process. We have 
all the ingredients for an ordinary renewal process except that the initial 
time to the first renewal has a distribution different from that of the other 
interoccurrence times. 
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A delayed renewal process will arise when the component in operation 
at time t = 0 is not new, but all subsequent replacements are new. For ex- 
ample, suppose that the time origin is taken y time units after the start of 
an ordinary renewal process. Then the time to the first renewal after the 
origin in the delayed process will have the distribution of the excess life 
at time y of an ordinary renewal process. 

As before, let W, = 0 and W, = X, + --- + X,, and let M(t) count the 
number of renewals up to time ¢. But now it is essential to distinguish be- 
tween the mean number of renewals in the delayed process 


M,(t) = E[N(¢)], (5.1.) 


and the renewal function associated with the distribution F, 


M(t) = >. F,(0). (5.2) 
k=1 
For the delayed process the elementary renewal theorem is 
M,(t ] 
lim =—, where pw = E[X,], (5.3) 
[9% be 


and the renewal theorem states that 
h 
lim [M)() — M)(t — A4)] = —, 
{Hx be 


provided X,, X,, . . . are continuous random variables. 


5.2. Stationary Renewal Processes 


A delayed renewal process for which the first life has the distribution 
function 


Gx) = wo! | (1 — FO) dy 
0) 


is called a stationary renewal process. We are attempting to model a re- 
newal process that began indefinitely far in the past, so that the remaining 
life of the item in service at the origin has the limiting distribution of the 
excess life in an ordinary renewal process. We recognize G as this limit- 
ing distribution. 
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It is anticipated that such a process exhibits a number of stationary, or 
time-invariant, properties. For a stationary renewal process, 


M,(t) = E[N()] = ~ (5.4) 


and 
Pr{y? = x} = GQ), 


for all t. Thus, what is in general\only an asymptotic renewal relation be- 
comes an identity, holding for all't, in a stationary renewal process. 


5.3. Cumulative and Related Processes 


Suppose associated with the ith unit, or lifetime interval, is a second ran- 
dom variable Y, ({Y,} identically distributed) in addition to the lifetime X;. 
We allow X; and Y, to be dependent but assume that the pairs (X,, ¥}), 
(X,, Y,),... are independent. We use the notation F(x) = Pr{X; = x}, 
G(y) = Pr{Y; = y}, w = E[X;], and v = E[Y;]. 

A number of problems of practical and theoretical interest have a nat- 
ural formulation in those terms. 


Renewal Processes Involving Two Components to Each 
Renewal Interval 


Suppose that Y, represents a portion of the duration X;. Figure 5.1 illus- 
trates the model. There we have depicted the Y portion occurring at the 


5 


Figure 5.1. A renewal process in which an associated random variable Y, 
represents a portion of the ith renewal interval. 
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beginning of the interval, but this assumption is not essential for the re- 
sults that follow. 

Let p(t) be the probability that ¢ falls in a Y portion of some renewal in- 
terval. When X,, X,,... are continuous random variables, the renewal the- 
orem implies the following important asymptotic evaluation: 


E 
lim p(t) = ——. (5.5) 
Here are some concrete examples. 


A Replacement Model Consider a replacement model in which re- 
placement is not instantaneous. Let Y, be the operating time and Z, the lag 
period preceding installment of the (i + 1)st operating unit. (The delay 
in replacement can be conceived as a period of repair of the service unit.) 
We assume that the sequence of times between successive replacements 
X,= Y,+ Z,,k = 1,2,..., constitutes a renewal process. Then p(t), the 
probability that the system is in operation at time f, converges to 
E[Y,VE[X,]. 


A Queueing Model A queueing process is a process in which cus- 
tomers arrive at some designated place where a service of some kind is 
being rendered, for example, at the teller’s window in a bank or beside the 
cashier at a supermarket. It is assumed that the time between arrivals, or 
interarrival time, and the time that is spent in providing service for a given 
customer are governed by probabilistic laws. 

If arrivals to a queue follow a Poisson process of intensity A, then the 
successive times X, from the commencement of the kth busy period to the 
start of the next busy period form a renewal process. (A busy period is an 
uninterrupted duration when the queue is not empty.) Each X, is composed 
of a busy portion Z, and an idle portion Y,. Then p(r), the probability that 
the queue is empty at time ¢, converges to E[Y, ]/E[X,]. This example is 
treated more fully in IX, which is devoted to queueing systems. 


The Peter Principle The “Peter Principle” asserts that a worker will 
be promoted until finally reaching a position in which he or she is incom- 
petent. When this happens, the person stays in that job until retirement. 
Consider the following single job model of the Peter Principle: A person 
is selected at random from the population and placed in the job. If the per- 
son is competent, he or she remains in the job for a random time having 


5. Generalizations and Variations on Renewal Processes 451 


cumulative distribution function F and mean wand is promoted. If 
incompetent, the person remains for a random time having cumulative 
distribution function G and mean v > pw and retires. Once the job is va- 
cated, another person is selected at random and the process repeats. 
Assume that the infinite population contains the fraction p of competent 
people and g = | — p incompetent ones. 

In the long run, what fraction of time is the position held by an incom- 
petent person? 

A renewal occurs every time that the position is filled, and therefore the 
mean duration of a renewal cycle is 


E[X,] = pw + (1 — p)v. 


To answer the question, we let Y, = X, if the kth person is incompetent, 
and Y, = 0 if the Ath person is competent. Then the long run fraction of 
time that the position is held by an incompetent person is 


EY] (1—p)v 
E[X] put+(1- py 


Suppose that p = ; of the people are competent, and that v = 10, while 
pw. = 1. Then 


Eth) _ (1/2)(10) _ 10 


= = 0.91. 
E(X,) (1/2)00) + (1/2)0) 11 


Thus, while half of the people in the population are competent, the job is 
filled by a competent person only 9 percent of the time! 


Cumulative Processes 


Interpret Y, as a cost or value associated with the ith renewal cycle. A class 
of problems with a natural setting in this general context of pairs (X;, ¥), 
where X; generates a renewal process, will now be considered. Interest 
here focuses on the so-called cumulative process 
N(t)+ 1 
WH= > Y,, 
k= 

the accumulated costs or value up to time f (assuming that transactions are 
made at the beginning of a renewal cycle). 
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The elementary renewal theorem asserts in this case that 


l ELY, 
lim —E[W(t)] = Eh] (5.6) 
tmx ff pL 


This equation justifies the interpretation of E[Y,]/u as a long run mean 
cost or value per unit time, an interpretation that was used repeatedly in 
the examples of Section 2. 

Here are some examples of cumulative processes. 


" Replacement Models Suppose Y, is the cost of the ith replacement. Let 
us suppose that under an age replacement strategy (see Section 3 and the ex- 
ample entitled “Age Replacement Policies” in Section 4) a planned re- 
placement at age J costs c, dollars, while a failure replaced at time x < T 
costs c, dollars. If Y, is the cost incurred at the kth replacement cycle, then 


C, with probability 1 — F(T), 
y= ( with probability F(T), 


and E[Y,] = c,[1 — F(T)] + c,F(T). Since the expected length of a re- 
placement cycle is 
T 
E{min{xX,, T}] =[t 1 — F(x)] dx, 
0 
we have that the long run cost per unit time is 


c[1 — F(T)] + F(T) 
Soll — F@)) dx” 


and in any particular situation a routine calculus exercise or recourse to 
numerical computation produces the value of T that minimizes the long 
run cost per unit time. 

Under a block replacement policy, there is one planned replacement 
every 7 units of time and, on the average, M(T) failure replacements, so 
the expected cost is E[Y,] = c, + c,M(T), and the long run mean cost per 
unit time is {c, + c,M(T)}/T. 


Risk Theory Suppose claims arrive at an insurance company accord- 
ing to a renewal process with interoccurrence times X,, X,,.... Let Y, be 
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the magnitude of the kth claim. Then W(t) = *)*' Y, represents the cu- 
mulative amount claimed up to time ¢, and the long run mean claim rate is 
_ EY] 
lim 7 FLW] EX] 
Maintaining Current Control of a Process A production process 
produces items one by one. At any instance, the process is in one of two 
possible states, which we label in-control and out-of-control. These states 
are not directly observable. Production begins with the process in-control, 
and it remains in-control for a random and unobservable length of time 
before a breakdown occurs, after which the process is out-of-control. A 
control chart is to be used to help detect when the out-of-control state oc- 
curs, so that corrective action may be taken. 

To be more specific, we assume that the quality of an individual item is 
a normally distributed random variable having an unknown mean and a 
known variance a”. If the process is in-control, the mean equals a standard 
target, or design value, jz. Process breakdown takes the form of shift in 
mean away from standard to pf, = sf, + da, where 6 is the amount of the 
shift in standard deviation units. 

The Shewhart control chart method for maintaining process control 
calls for measuring the qualities of the items as they are produced and then 
plotting these qualities versus time on a chart that has lines drawn at the 
target value j, and above and below this target value at jz, + ko, where k 
is a parameter of the control scheme being used. As long as the plotted 
qualities fall inside these so-called action lines at 4) + ko, the process is 
assumed to be operating in-control, but if ever a point falls outside these 
lines, the process is assumed to have left the in-control state, and investi- 
gation and repair are instituted. There are obviously two possible types of 
errors that can be made while thus controlling the process: (1) needless in- 
vestigation and repair when the process is in-control yet an observed qual- 
ity purely by chance falls outside the action lines and (2) continued oper- 
ation with the process out-of-control because the observed qualities are 
falling inside the action lines, again by chance. 

Our concern is the rational choice of the parameter k, that is, the ratio- 
nal spacing of the action lines, so as to balance, in some sense, these two 
possible errors. 
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The probability that a single quality will fall outside the action lines 
when the process is in-control is given by an appropriate area under the 
normal density curve. Denoting this probability by a, we have 


a = ®(—k) + 1 — O(k) = 20(-4), 
where P(x) = (277) "? f*,, exp(—y’/2) dy is the standard cumulative nor- 


mal distribution function. Representative values are given in the follow- 
ing table: 


k a 
1.645 0.10 


1.96 0.05 


Similarly, the probability, denoted by p, that a single point will fall out- 
side the action lines when the process is out-of-control is given by 


p= O(-6 —k) + 1-— O(-6+ 4). 


Let S denote the number of items inspected before an out-of-control signal 
arises assuming that the process is out-of-control. Then Pr{S = 1} = p, 
Pr{S = 2} = (1 — p)p, and in general, Pr{S = n} = (1 — p)""'p. Thus, S$ 
has a geometric distribution, and 


E{S] = I 
4 


Let T be the number of items produced while the process is in-control. We 
suppose that the mean operating time in-control E[T] is known from past 
records. 

The sequence of durations between detected and repaired out-of- 
control conditions forms a renewal process because each such duration 
begins with a newly repaired process and is a probabilistic replica of all 
other such intervals. It follows from the general elementary renewal 
theorem that the long run fraction of time spent out-of-control (O.C.) is 


__ FS) _ 
0.0. = E([S]+ E(T] 1+ pE(T] 


The long run number of repairs per unit time is 


_ I —-~ Po 
~ E(S]+E[T] 1+ pE(T] 
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Let N be the random number of “false alarms” while the process is in- 
control, that is, during the time up to 7, the first out-of-control. Then, con- 
ditioned on T, the random variable N has a binomial distribution with 
probability parameter a, and thus E[N|T} = aT and E[N] = aE[T]. 
Again, it follows from the general elementary renewal theorem that the 
long run false alarms per unit time (F.A.) is 


EIN]  —_ —@pE[T} 


CA: = FIs] + Elf] 1+>E\Tl 


If each false alarm costs c dollars, each repair cost K dollars, and the 
cost rate while operating out-of-control is C dollars, then we have the long 
run average cost per unit time of 


A.C. = C(O.C.) + K(R) + c(FA.) 


_ C+ Kp + capE{T] 
1 + pE[T] 


By trial and error one may now choose k, which determines q@ and p, so 
as to minimize this average cost expression. 


Exercises 


5.1. Jobs arrive at a certain service system according to a Poisson 
process of rate A. The server will accept an arriving customer only if it is 
idle at the time of arrival. Potential customers arriving when the system is 
busy are lost. Suppose that the service times are independent random vari- 
ables with mean service time yz. Show that the long run fraction of time 
that the server is idle is 1/(1 + Aw). What is the long run fraction of po- 
tential customers that are lost? 


5.2. The weather in a certain locale consists of alternating wet and dry 
spells. Suppose that the number of days in each rainy spell is Poisson dis- 
tributed with parameter 2, and that a dry spell follows a geometric distri- 
bution with a mean of 7 days. Assume that the successive durations of 
rainy and dry spells are statistically independent random variables. In the 
long run, what is the probability on a given day that it will be raining? 
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5.3. Consider a light bulb whose life is a continuous random variable X 
with probability density function f(x), for x > 0. Assuming that one starts 
with a fresh bulb and that each failed bulb is immediately replaced by a 
new one, let M(t) = E[N(‘)] be the expected number of renewals up to 
time t. Consider a block replacement policy (see Section 2.1) that replaces 
each failed bulb immediately at a cost of c per bulb and replaces all bulbs 
at the fixed times 7, 27, 37, ... . Let the block replacement cost per bulb 
be b < c. Show that the long run total mean cost per bulb per unit time is 


b+ cM(T) 
—_ 


Investigate the choice of a cost minimizing value T* when M(t) = 
t+ 1 — exp(—at). 


©O(T) = 


Problems 


5.1. A certain type component has two states: 0 = OFF and 1 = 
OPERATING. In state 0, the process remains there a random length of 
time, which is exponentially distributed with parameter a, and then moves 
to state 1. The time in state 1 is exponentially distributed with parameter 
B, after which the process returns to state 0. 

The system has two of these components, A and B, with distinct 
parameters: 


Component Operating Failure Rate Repair Rate 


A Ba aa 
B Bs As 


In order for the system to operate, at least one of components A and B 
must be operating (a parallel system). Assume that the component sto- 
chastic processes are independent of one another. 


(a) In the long run, what fraction of time is the system inoperational 
(not operating)? 

(b) Once the system enters the failed state, what is the mean duration 
there prior to returning to operation? 
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(c) Define a cycle as the time between the instant that the system first 
enters the failed state and the next such instant. Using renewal the- 
ory, find the mean duration of a cycle. 

(d) What is the mean system operating duration between successive 
system failures? 


5.2. The random lifetime X of an item has a distribution function F(x). 
What is the mean total life E[X IX ate of an item of age x? 


5.3. At the beginning of each period, customers arrive at a taxi stand at 
times of a renewal process with distribution law F(x). Assume an unlim- 
ited supply of cabs, such as might occur at an airport. Suppose that each 
customer pays a random fee at the stand following the distribution law 
G(x), for x > 0. Write an expression for the sum W(t) of money collected 
at the stand by time ¢, and then determine the limit expectation 

i; E(W(0)] 

im ———.. 

1x f 
5.4. A lazy professor has a ceiling fixture in his office that contains two 
light bulbs. To replace a bulb, the professor must fetch a ladder, and being 
lazy, when a single bulb fails, he waits until the second bulb fails before 
replacing them both. Assume that the length of life of the bulbs are inde- 
pendent random variables. 


(a) If the lifetimes of the bulbs are exponentially distributed, with the 
same parameter, what fraction of time, in the long run, is our profes- 
sor’s office half lit? 


(b) What fraction of time, in the long run, is our professor’s office half 
lit if the bulbs that he buys have the same uniform (0, 1) lifetime 
distribution? 


6. Discrete Renewal Theory* 


In this section we outline the renewal theory that pertains to nonnegative 
integer-valued lifetimes. We emphasize renewal equations, the renewal 
argument, and the renewal theorem (Theorem 6.1). 


* The discrete renewal model is a special case in the general renewal theory presented 
in sections 1-5 and does not arise in later chapters. 
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Consider a light bulb whose life, measured in discrete units, is a ran- 
dom variable X where Pr{X = k} = p, fork = 0,1,....If one starts with 
a fresh bulb and if each bulb when it burns out is replaced by a new one, 
then M(n), the expected number of renewals (not including the initial 
bulb) up to time n, solves the equation 


M(n) = F,(n) + >. p,M(n — by, (6.1) 
k=0 


where F,(n) = py +--+: + p, is the cumulative distribution function of the 
random variable X. A vector or functional equation of the form (6.1) in the 
unknowns M(Q), M(1), ... is termed a renewal equation. The equation is 
established by a renewal argument, a first step analysis that proceeds by 
conditioning on the life of the first bulb and then invoking the law of total 
probability. In the case of (6.1), for example, if the first bulb fails at time 
k =n, then we have its failure plus, on the average, M(n — k) additional 
failures in the interval [k,k + 1,...,n]. We weight this conditional mean 
by the probability p, = Pr{X, = k} and sum according to the law of total 
probability to obtain 


M(n) = > [1 + M(n— bIp, 
k=0 
= F(n) = > p.M(n — &). 
k=0 


Equation (6.1) is only a particular instance of what is called a renewal 
equation. In general, a renewal equation is prescribed by a given bounded 
sequence {b,} and takes the form 


n 
vy, = b, + » Prk forn=0,1,.... (6.2) 
k=0 
The unknown variables are vo, v,,..., and po, p;, .. . is a probability dis- 
tribution for which, to avoid trivialities, we always assume py < 1. 
Let us first note that there is one and only one sequence W, v,, .. . sat- 


isfying a renewal equation, because we may solve (6.2) successively to 
get 
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bo 


Vg = TT) 
° 1 — po 


(6.3) 
_ b, + PV 


" 1 — py” 
and so on. 

Let u,, be the mean number of renewals that take place exactly in period 
n. When p, = 0, so that the lifetimes are strictly positive and at most one 
renewal can occur in any period, then u,, is the probability that a single re- 
newal occurs in period n. The sequence uo, u,, . . . satisfies a renewal equa- 


tion that is of fundamental importance in the general theory. Let 


] forn = O, 
6, = , 
" \ forn > 0. (6.4) 
Then {u,,} satisfies the renewal equation 
u, = 6, + > PpUy—i forn=0,1,.... (6.5) 
k=0 


Again, equation (6.5) is established via a renewal argument. First, ob- 
serve that 6, counts the initial bulb, the renewal at time 0. Next, condition 
on the lifetime of this first bulb. If it fails in period k = n, which occurs 
with probability p,, then the process begins afresh and the conditional 
probability of a renewal in period n becomes u,_,. Weighting the contin- 
gency represented by u,,_, by its respective probability p, and summing ac- 
cording to the law of total probability then yields (6.5). 

The next lemma shows how the solution {v,} to the general renewal 
equation (6.2) can be expressed in terms of the solution {u,,} to the partic- 
ular equation (6.5). 


Lemma 6.1 /f {v,} satisfies (6.2) and {u,} satisfies (6.5), then 
Vv, = >. Byatt, forn=0,1,.... 
k=0 
Proof In view of our remarks on the existence and uniqueness of solu- 


tions to equation (6.2), we need only verify that v, = Dj D,-,u, Satisfies 
(6.2). We have 
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Sadat S na 


k 
= b, + >» By, Px—1Uy 


k=0 1=0 
non 
= 5b, + By Py-1Uy; 
1=0 k=l 
nn-l 
= b, + >» pb, 1; 
!=0 j=0 


= b, + >. Db, -;-1Uy 


example Let X,, X,,... be the successive lifetimes of the bulbs and let 

= 0 and W,=X, + --- +X, be the replacement times. We assume 
hi Py = Pr{X, = 0} = 0. The number of replacements (not including the 
initial bulb) up to time n is given by 


Nn) =k for W, =n < W,,,. 
The M(n) = E[N(n)] satisfies the renewal equation (6.1) 


M(n) = po +--+ +p, + >, Mn — b), 
k=0 
and elementary algebra shows that m, = E[N(n) + 1] = M(n) + 1 satisfies 
m, = 1+ >» P.M,,-1 forn=0,1,.... (6.6) 
k=0 


Then (6.6) is a renewal equation for which b, = 1 for all n. In view of 
Lemma 6.1 we conclude that 
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ue 


ai 
m, = >, lu, =Uypt:::+u 


Conversely, u, = m, — m,_, = M(n) — M(n — 1). 

To continue with the example, let g, = E[W,,,.,]. The definition is il- 
lustrated in Figure 6.1. We will argue that g, satisfies a certain renewal 
equation. As shown in Figure 6.1, W,,,).,; always includes the first renewal 
duration X,. In addition, if X, = k = n, which occurs with probability p,, 


then the conditional mean of the added lives constituting Wy,+1 iS g,-¢. 


Weighting these conditional means by their respective probabilities and 
summing according to the law of total probability then gives 


8» = E[X,] + S Bn-i Pr 
k=0 
Hence, by Lemma 6.1, 
8, = >. E[X,]u, = E[X,]m,. 
k=0 


We get the interesting formula [see (1.7)] 
E[X, + +++ + Xyyei] = ELX,] X ELN(n) + 1). (6.7) 
Note that M(n) is not independent of {X,}, and yet (6.7) still prevails. 


N(n) 


Wo W, W, n Wainy +1 n 


Figure 6.1 W,,,,,, always contains X, and contains additional durations when 
X,=ken. 
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6.1. The Discrete Renewal Theorem 


The renewal theorem provides conditions under which the solution {v,} to 
a renewal equation will converge as n grows large. Certain periodic be- 
havior, such as failures occurring only at even ages, must be precluded, 
and the simplest assumption assuring this preclusion is that p, > 0. 


Theorem 6.1 Suppose that 0 < p, < 1 and that {u,} and {v,} are the 
.solutions to the renewal equations (6.5) and (6.2), respectively. Then 
(a) lim,. u, = V/DXzokp.y and (b) if Deb, <<, then lim,_,.v, = 
{dioDi}/{ Licokpe}- 


We recognize that > )kp, = E[X;,] is the mean lifetime of a unit. Thus 
(a) in Theorem 6.1 asserts that in the long run, the probability of a renewal 
occurring in a given interval is one divided by the mean life of a unit. 


Remark Theorem 6.1 holds in certain circumstances when p, = 0. It 
suffices to assume that the greatest common divisor of the integers k for 
which p, > 0 is one. 


Example Let y, = W,,,,.; — 1 be the excess life at time n. For a fixed 
integer m, let f,(m) = Pr{y, = m}. We will establish a renewal equation 
for f,(m) by conditioning on the first life X,. For m = 1, 


f,-.(m) fOs=k=n, 
Pr{y, = m|X, =k} =41 ifk=n+m, 
0 otherwise. 


(The student is urged to diagram the alternatives arising in 
Pr{y, = m|X, = k}.) Then, by the law of total probability, 


x 


f.(m) = Pr{ {Y, — m} = 2,Pr r{y, = miX, = k}p 


(6.8) 
= Prrtn + > fi- (M1) Py. 


k=0 


We apply Theorem 6.1 with b, = p,,.,, to conclude that 
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li P { } Di=0 Pin+k 
im Pr{y, = m} = =>. 
nx % Diso kp, 
Pr{X,= 
= FNS m= 1,2)... 
E[X)] 


The limit is a bona fide probability mass function, since its terms sum 
to one: ! 


Sines PX Sm} _ ELK _ 
E[X\] E[X,] 


6.2. Deterministic Population Growth with Age Distribution 


In this section we will discuss a simple deterministic model of population 
growth that takes into account the age structure of the population. Sur- 
prisingly, the discrete renewal theorem (Theorem 6.1) will play a role in 
the analysis. As the language will suggest, the deterministic model that we 
treat may be viewed as describing the mean population size in a more 
elaborate stochastic model that is beyond our scope to develop fully. 


A Simple Growth Model Let us set the stage by reviewing a simple 
model that has no age structure. We consider a single species evolving in 
discrete time t = 0, 1, 2,..., and we let N, be the population size at time 
t. We assume that each individual present in the population at time f gives 
rise to a constant number A of offspring that form the population at time 
t + |. (If death does not occur in the model, then we include the parent as 
one of the offspring, and then necessarily A = 1.) If Np is the initial popu- 
lation size, and each individual gives rise to \ offspring, then 


N, = AN), 
N, = AN, = YN), 
and in general, 


N, = A‘N,. (6.9) 
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If A > 1, then the population grows indefinitely in time; if A < 1, then the 
population dies out; while if A = 1, then the population size remains con- 
stant at NV, = N, for allt =0,1,.... 


The Model with Age Structure We shall now introduce an age 
structure in the population. We need the following notation: 


n,, = the number of individuals of age u in the population at time f; 
N, = Dro n,, = the total number of individuals in the population at 
time f¢; 


b, = the number of new individuals created in the population at time 
t, the number of births; 


8, = the expected number of progeny of a single individual of age u 
in one time period; 

|, = the probability that an individual will survive, from birth, at 
least to age u. 


The conditional probability that an individual survives at least to age vu, 
given that is has survived to age u — 1, is simply the ration /,,/l,,_,. The net 
maternity function is the product 


mM, = LB, 


and is the birth rate adjusted for the death of some fraction of the popula- 
tion. That is, m, 1s the expected number of offspring at age u of an indi- 
vidual now of age 0. 

Let us derive the total progeny of a single individual during its lifespan. 
An individual survives at least to age u with probability /,, and then dur- 
ing the next unit of time gives rise to B, offspring. Summing 1,8, = m, 
over all ages u then gives the total progeny of a single individual: 


m= 1B, = S m,. (6.10) 
u=0 u=O0 


If M > 1, then we would expect the population to increase over time; if 
M < 1, then we would expect the population to decrease; while if M = 1, 
then the population size should neither increase nor decrease in the long 
run. This is indeed the case, but the exact description of the population 
evolution is more complex, as we will now determine. 
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In considering the effect of age structure on a growing population, our 
interest will center on b,, the number of new individuals created in the 
population at time t. We regard B,, /,, and n,. as known, and the problem 
is to determine D, for t = 0. Once b, is known, then n,, and N, may be de- 
termined according to, for example, 


No, = dD, (6.11) 
L, 
Ny) a Mead for u= I, (6.12) 
unl 
and 
| N= >) Ma (6.13) 
u=0 


In the first of these simple relations, n),, 1s the number in the population 
at time 1 of age 0, which obviously is the same as b,, those born in the 
population at time 1. For the second equation, n,, 1s the number in the 
population at time 1 of age u. These individuals must have survived from 
the n,_, individuals in the population at time 0 of age u — 1; the condi- 
tional probability of survivorship is [J,,/1,_,], which explains the second 
equation. The last relation simply asserts that the total population size re- 
sults by summing the numbers of individuals of all ages. The generaliza- 
tions of (6.11) through (6.13) are 


No, = 5, (6.14) 
l 
n,, = Merve | foru=1, (6.15) 
ut 
and 
N=) n,,  fort=1. (6.16) 
u=O0 


Having explained how n,, and N, are found once b, is known, we turn 
to determining b,. The number of individuals created at time ¢ has two 
components. One component, a,, say, counts the offspring of those indi- 
viduals in the population at time t who already existed at time 0. In the 
simplest case, the population begins at time t = 0 with a single ancestor 
of age u = 0, and then the number of offspring of this individual at time t 
is a, = m,, the net maternity function. More generally, assume that there 
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were n1,,, individuals of age u at time 0. The probability that an individual 
of age u at time 0 will survive to time ¢ (at which time it will be of age 
t+ u) is l,,,/1,. Hence the number of individuals of age u at time 0 that 
survive to time f is 7, o(/,.,/l,,), and each of these individuals, now of age 
t + u, will produce B,,,, new offspring. Adding over all ages we obtain 


= l 
_ itu 
a, ™~ >» B, tulluo 


u=0 l, 


(6.17) 


The second component of b, counts those individuals created at time t 
whose parents were not initially in the population but were born after time 
0. Now, the number of individuals created at time 71s b,. The probability 
that one of these individuals survives to time ¢, at which time he will be 
of age t — 7, is ],_,. The rate of births for individuals of age t — Tis B,_,. 
The second component results from summing over 7 and gives 


t 
b,=a,+ > Becl,-<b, 
7T=0 
. (6.18) 
=a,+ >» m,_,D.. 


T=0 


Example Consider an organism that produces two offspring at age 1, 
and two more at age 2, and then dies. The population begins with a single 
organism of age 0 at time 0. We have the data 


Ano = 1, No =O foru= 1, 
b, = b, = 2, 
Lh=l=hL=1 and J,=0 foru>2. 
We calculate from (6.17) that 
a,=0, a, =2, a,=2, and a,=0, fort> 2. 


Finally, (6.18) is solved recursively as 
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by = 0, 

b, =a, + mb,+ mb, 
=2+ 0+ 0 =2, 

b, =a, + mb,+ mb, + myby 
=2+ 0 +(2)2)+ O =6, 

b, =a, + mob, + mb, + mb, + mby 
=0O+ 0 +(2)(6)+ (2)(2) + 0 =16. 


Thus, for example, an individual of age 0 at time 0 gives rise to 16 new 
individuals entering the population at time 3. 


The Long Run Behavior 


Somewhat surprisingly, since no “renewals” are readily apparent, the dis- 
crete renewal theorem (Theorem 6.1) will be invoked to deduce the long 
run behavior of this age-structured population model. Observe that (6.18) 


t 
b, = G, + >» m,_,b, 
T=0 
(6.19) 
=a,+ >» m,b,_, 
v=0 


has the form of a renewal equation except that {m,} is not necessarily a 
bona fide probability distribution in that, typically, {m,} will not sum to 
one. Fortunately, there is a trick that overcomes this difficulty. We intro- 
duce a variable s, whose value will be chosen later, and let 


m*=m,s", b*= bs", and a*=a,s". 
Now multiply (6.19) by s' and observe that s'm,b,_,. = (m,s")(b,_,s""") = 


m*b*_, to get 


t 


bé= ait >» m*b* (6.20) 


v=0 
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This renewal equation holds no matter what value we choose for s. We 
therefore choose s such that {m*} is a bona fide probability distribution. 
That is, we fix the value of s such that 


There is always a unique « such s whenever 1 < 37, m, < %. We may now 
apply the renewal theorem to (6.20), provided that its hypothesis con- 
cerning nonperiodic behavior is satisfied. For this it suffices, for example, 
that m, > 0. Then we conclude that 


IM 
fl 


x a’ 

-=0 ' 
lim bf = lim b,s' = =: 
i% > 0 vm, 


(6.21) 
We set A = I/s and K = >%., a*/>~_, vm* to write (6.21) in the form 
b,~ KX for t large. 


In words, asymptotically, the population grows at rate A where A = I1/s is 
the solution to 


n 


l, 
a — reer | 
Fall l, | 
=N._»,_5 —_— 
ware? l.-2 L-\ 


This simply expresses that those of age u at time t were born t — u time 
units ago and survived. Since for large t we have b,_, ~ KA‘, then 
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n,, ~ KLA™ = KLAN, 


N,= >. tua ~ KY (1,A)A', 
u=0 


and 


lim Nat LA“ 
mM —_ - a, 
Ie N, DS y=0lyA™’ 


This last expression furnishes the asymptotic, or stable, age distribution 
in the population. 


Example Continuing the example in which m, = m, = 2 and m, = 0 
otherwise, then we have 


> m,s” = 2s + 2s? = 1, 
v=0 


which we solve to obtain 


—2+V44+8 -1+¥V3 


s = 


4 2 
= (0.366, — 1.366). 


The relevant root is s = 0.366, whence A = 1/s = 2.732. Thus asymptot- 
ically, the population grows geometrically at rate A = 2.732 ---, and the 
stable age distribution is as shown in the following table: 


Age Fraction of Population 


0 1/1 + s + s*) = 0.6667 
1 sil + 5 +s’) = 0.2440 
2 sS/(1 + s + s?) = 0.0893 


Exercises 


6.1. Solve for v, forn = 0, 1,..., 10 in the renewal equation 


n 
v,= b+ >. Dy forn=0,1,..., 
k=0 


where b, = b, = +, b) = b, = +++ = 0, and p, = 4, p, = 3, and p, = i. 
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6.2. (Continuation of Exercise 6.1) 


(a) Solve for u, forn = 0, 1,..., 10 in the renewal equation 
Uu,, — 6, + » Py¥y-x for n= 0, I, so 89 
k=0 


where 6, = 1, 6, = 6, =--- = 0, and {p,} is as defined in Exercise 
6.1. 

(b) Verify that the solution v, in Exercise 6.1 and u, are related accord- 
ing to V,, = D0 D Uy Ke 


6.3. Using the data of Exercises 6.1 and 6.2, determine 


(a) lim, U,,. 
(b) lim,_,.. V,- 


Problems 


6.1. Suppose the lifetimes X,, X,,... have the geometric distribution 
Pr{X, = k} = a(1 — a)*! fork =1,2,..., 
where 0< a < 1. 


(a) Determine u, forn = 1,2,.... 
(b) Determine the distribution of excess life y, by using Lemma 6.1 
and (6.8). 


6.2. Marlene has a fair die with the usual six sides. She throws the die 
and records the number. She throws the die again and adds the second 
number to the first. She repeats this until the cumulative sum.of all the 
tosses first exceeds a prescribed number n. (a) When n = 10, what is the 
probability that she stops at a cumulative sum of 13? (b) When n is large, 
what is the approximate probability that she stops at a sum of n + 3? 


6.3. Determine the long run population growth rate for a population 
whose individual net maternity function is m, = m, = 2, and m, = 0 
otherwise. Why does delaying the age at which offspring are first pro- 
duced cause a reduction in the population growth rate? (The population 
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growth rate when m, = m, = 2, and m, = O otherwise was determined in 
the last example of this section.) 


6.4. Determine the long run population growth rate for a population 
whose individual net maternity function is m) =m, =0 and m, = 
m,=:::=a> 0. Compare this with the population growth rate when 
m, = a, and m, = 0 for k # 2. 


Chapter VIII 


Brownian Motion and 
Related Processes 


1. Brownian Motion and Gaussian Processes 


The Brownian motion stochastic process arose early in this century as an 
attempt to explain the ceaseless irregular motions of tiny particles sus- 
pended in a fluid, such as dust motes floating in air. Today, the Brownian 
motion process and its many generalizations and extensions occur in nu- 
merous and diverse areas of pure and applied science such as economics, 
communication theory, biology, management science, and mathematical 
Statistics. 


1.1. A Little History 


The story begins in the summer of 1827, when the English botanist Robert 
Brown observed that microscopic pollen grains suspended in a drop of 
water moved constantly in haphazard zigzag trajectories. Following the 
reporting of his findings, other scientists verified the strange phenomenon. 
Similar Brownian motion was apparent whenever very small particles 
were suspended in a fluid medium, for example, smoke particles in air. 
Over time, it was established that finer particles move more rapidly, that 
the motion is stimulated by heat, and that the movement becomes more 
active with a decrease in fluid viscosity. 
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A satisfactory explanation had to wait until the next century, when in 
1905, Einstein would assert that the Brownian motion originates in the 
continual bombardment of the pollen grains by the molecules of the sur- 
rounding water, with successive molecular impacts coming from different 
directions and contributing different impulses to the particles. Einstein ar- 
gued that as a result of the continual collisions, the particles themselves 
had the same average kinetic energy as the molecules. Belief in molecules 
and atoms was far from universal in 1905, and the success of Einstein’s 
explanation of the well-documented existence of Brownian motion did 
much to convince a number of distinguished scientists that such things as 
atoms actually exist. Incidentally, 1905 is the same year in which Einstein 
set forth his theory of relativity and his quantum explanation for the pho- 
toelectric effect. Any single one of his 1905 contributions would have 
brought him recognition by his fellow physicists. Today, a search in a uni- 
versity library under the subject heading “Brownian motion” is likely to 
turn up dozens of books on the stochastic process called Brownian motion 
and few, if any, on the irregular movements observed by Robert Brown. 
The literature on the model has far surpassed and overwhelmed the liter- 
ature on the phenomenon itself! 

Brownian motion is complicated because the molecular bombardment 
of the pollen grain is itself a complicated process, so it is not surprising that 
it took more than another decade to get a clear picture of the Brownian mo- 
tion stochastic process. It was not until 1923 that Norbert Wiener set forth 
the modern mathematical foundation. The reader may also encounter 
“Wiener process” or “Wiener—Einstein process” as names for the stochas- 
tic process that we will henceforth simply call “Brownian motion.” 

Predating Einstein by several years, in 1900 in Paris, Louis Bachelier 
proposed what we would now call a “Brownian motion model” for the 
movement of prices in the French bond market. While Bachelier’s paper 
was largely ignored by academics for many decades, his work now stands 
as the innovative first step in a mathematical theory of stock markets that 
has greatly altered the financial world of today. Later in this chapter, we will 
have much to say about Brownian motion and related models in finance. 


1.2. The Brownian Motion Stochastic Process 


In terms of our general framework of stochastic processes (cf. I, Section 
1.1), the Brownian motion process is an example of a continuous-time, 
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continuous-state-space Markov process. Let B(t) be the y component (as a 
function of time) of a particle in Brownian motion. Let x be the position 
of the particle at time %; 1.e., B(fo) = x. Let p(y, tx) be the probability den- 
sity function, in y, of B(t, + 2), given that B(t,) = x. We postulate that the 
probability law governing the transitions is stationary in time, and there- 
fore p(y, tx) does not depend on the initial time {. 

Since p(y, tx) is a probability density function in y, we have the 
properties 


py, t|x) =Q(Q and | p(y, t 


—o 


x) dy = 1. (1.1) 


Further, we stipulate that B(t, + tis likely to be near B(t)) = x for small 
values of t. This is done formally by requiring that 


tim £0. tix)=0O for y#x. (1.2) 


From physical principles, Einstein showed that p(y, t 
partial differential equation 


x) must satisfy the 


Op _ 
Ot ax? 


(1.3) 


This is called the diffusion equation, and a” is the diffusion coefficient, 
which Einstein showed to be given by o” = RT/Nf, where R is the gas con- 
stant, T is the temperature, NV is Avogadro’s number, and f is a coefficient 
of friction. By choosing the proper scale, we may take o° = 1. With this 
choice we can verify directly (See Exercise 1.3) that 


x)= — exp(-5-0 - »] (1.4) 


Jo 
Ply Vani 


is a solution of (1.3). In fact, it is the only solution under the conditions 
(1.1) and (1.2). We recognize (1.4) as a normal probability density func- 
tion whose mean is x and whose variance is t. That is, the position of the 
particle t time units after observations begin is normally distributed. The 
mean position is the initial location x, and the variance is the time of ob- 
servation tf. 

Because the normal distribution will appear over and over in this chap- 
ter, we are amply justified in standardizing some notation to deal with it. 
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Let 


e732 


p(z) = , TOK ZK, 1.5 
Vim O°) 


be the standard normal probability density function, and let 


w= [ 60 ar (1.6) 


_-x 


be the corresponding cumulative distribution function. A small table of the 
cumulative normal distribution appears at the end of this section (see 
p. 486). Let 


1 
$(z) = Vie! V2), (1.7) 


and 


B10) = | (0) dx = OeVi) (1.8) 


— x 


be the probability density function and cumulative distribution function, 
respectively, for the normal distribution with mean zero and variance t. In 
this notation, the transition density in (1.4) 1s given by 


x) = &(y — x), (1.9) 


pty. t 


and 


y-x 
Pr{B(t) = yiBOO) = x} = 0 ) 
r{B(t) = y|B(O) = x} Vi 
The transition probability density function in (1.4) or (1.9) gives only the 
probability distribution of B(t) — B(O). The complete description of the 
Brownian motion process with diffusion coefficient a is given by the fol- 
lowing definition. 


Definition Brownian motion with diffusion coefficient a? is a stochas- 
tic process {B(t); t = 0} with the properties: 


(a) Every increment B(s + t) — B(s) 1s normally distributed with mean 
zero and variance ot; o? > 0 is a fixed parameter. 
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(b) For every pair of disjoint time intervals (¢,, t,], (4, tJ], with 
0=t,<t, St, <4, the increments B(t,) — B(t,) and B(t,) — B(t,) 
are independent random variables, and similarly for n disjoint time 
intervals, where 7 is an arbitrary positive integer. 

(c) B(O) = 0, and B(?) is continuous as a function of t. 


The definition says that a displacement B(s + t) — B(s) is independent 
of the past, or alternatively, if we know B(s) = x, then no further knowl- 
edge of the values of B(7) for past times tT < s has any effect on 
our knowledge of the probability law governing the future movement 
B(s + t) — B(s). This is a statement of the Markov character of the 
process. We emphasize, however, that the independent increment as- 
sumption (b) is actually more restrictive than the Markov property. A typ- 
ical Brownian motion path is illustrated in Figure 1.1. 


B(t) 


Way ? Cay 1 


Figure 1.1. A typical Brownian motion path. 
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The choice B(O) = 0 is arbitrary, and we often consider Brownian mo- 
tion starting at x, for which B(O) = x for some fixed point x. For Brown- 
ian motion starting at x, the variance of B(?) is o°t, and o? is termed the 
variance parameter in the stochastic process literature. The process B(t) 
= B(t)/o is a Brownian motion process whose variance parameter is one, 
the so-called standard Brownian motion. By this device, we may always 
reduce an arbitrary Brownian motion to a standard Brownian motion; for 
the most part, we derive results only for the latter. By part (a) of the defi- 
nition, for a standard Brownian motion (a? = 1), we have 


Pr{B(s + t) S y|B(s) = x} = Pr{B(s +d — B(s) Sy — x} 
of Vi } 


Remark Let us look, for the moment, at a Brownian displacement 
B(At) after a small elapsed time At. The mean displacement is zero, and 
the variance of the displacement is At itself. It is much more common in 
practical work to use a standard deviation, the square root of the variance, 
to measure variability. For the normal distribution, for instance, the prob- 
ability of an observation more than 2 standard deviations away from the 
mean is about 5%, and the standard deviation is in the same units as the 
original measurement, and not (units)’. The standard deviation of the 
Brownian displacement is V Ar, which is much larger than At itself 
when At is small. Indeed, StdDev[B(At) /At = VAd/At = 1/VAt > © as 
At — 0. This is simply another manifestation of the erratic movements of 
the Brownian particle, yet it is a point that Bachelier and others had diffi- 
culty in handling. But the variance, being linear in time, and not the 
standard deviation, is the only possibility if displacements over disjoint 
time intervals are to be stationary and independent. Write a total dis- 
placement B(s + t) — B(O) as the sum of two incremental steps in the form 
B(s + t) — BO) = {B(s) — B(O)} + {B(s + t) — B(s)}. The incremental 
steps being statistically independent, their variances must add. The sta- 
tionary assumption is that the statistics of the second step B(t + s) — B(t) 
do not depend on the time t when the step began, but only on the duration 
s of the movement. We must have, then, Var[B(t + s)] = Var[B(t)] + 
Var[B(s)], and the only nonnegative solution to such an equation is to have 
the variance of the displacement a linear function of time. 
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The Covariance Function 


Using the independent increments assumption (b), we will determine the 
covariance of the Brownian/motion. Recall that E[B(t)] = 0 and that 
E[B(t] = o°t. Then, for0 #5 <1, 


Cov[B(s), BOO] = ELB(S) BO) 

= E{B(s){B(t) — B(s) + BGs) }] 

= E[B(s)*] + E[B(s){B(t) — B(s)}] 
o’s + E[B(s)JE[B(t) — B(s)] (by (b)) 
= o's (since E[B(s)] = 0). 


Similarly, if 0 = t < s, we obtain Cov[B(s), B(1)] = o°t. Both cases may 
be treated in a single expression via 


Cov[B(s), B(t)] = oa? min{s, t}, fors,t = 0. (1.10) 


' 


1.3. The Central Limit Theorem and the 
Invariance Principle 


Let S, = € +--+: + &, be the sum of n independent and identically dis- 
tributed random variables €,,..., €, having zero means and unit vari- 
ances. In this case, the central limit theorem asserts that 


lim Pr} = x} = D(x) for all x. 
The central limit theorem is stated as a limit. In stochastic modeling, it is 
used to justify the normal distribution as appropriate for a random quan- 
tity whose value results from numerous small random effects, all acting 
independently and additively. It also justifies the approximate calculation 
of probabilities for the sum of independent and identically distributed 
summands in the form Pr{S, = x} ~ ®(x/Vn), the approximation known 
to be excellent even for moderate values of n in most cases in which the 
distribution of the summands is not too skewed. 

In a similar manner, functionals computed for a Brownian motion can 
often serve as excellent approximations for analogous functionals of a 
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partial sum process, as we now explain. As a function of the continuous 
variable t, define 


B()= 7, 120, (1.11) 


where [x] is the greatest integer less than or equal to x. Observe that 


B,(t) = 2 = wl fort <t<++ 
" Vn Vn Vk’ no non 


Because S,/Vk has unit variance, the variance of B,(t) is [nt]/n, which 
converges to tas n > ©. When nis large, then k = [nt] is large, and S,/Vk 
is approximately normally distributed by the central limit theorem, and, fi- 
nally, B,(t) inherits the independent increments property (b) from the pos- 
tulated independence of the summands. It is reasonable, then, to believe 
that B,(t) should behave much like a standard Brownian motion process, 
at least when n is large. This is indeed true, and while we cannot explain 
the precise way in which it holds in an introductory text such as this, the 
reader should leave with some intuitive feeling for the usefulness of the 
result and, we hope, a motivation to learn more about stochastic 
processes. The convergence of the sequence of stochastic processes de- 
fined in (1.11) to a standard Brownian motion is termed the invariance 
principle. It asserts that some functionals of a partial sum process of in- 
dependent and identically distributed zero mean and unit variance random 
variables should not depend too heavily on (should be invariant of ) the ac- 
tual distribution of the summands, but be approximately given by the 
analogous functional of a standard Brownian motion, provided only that 
the summands are not too badly behaved. 


Example Suppose that the summands have the distribution in which 
€ = +1, each with probability }. Then the partial sum process S, is a sim- 
ple random walk for which we calculated in III, Section 5.3 (using a dif- 
ferent notation) 


Pr{S, reaches —a < 0 before b > 01S, = 0} 
b 


ars (1.12) 
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Upon changing the scale in accordance with (1.11), we have 
Pr{B(t) reaches —a < O before b > 015, = 0} 


_ b\/n _  »b 
aVn+bVn atb’ 
and invoking the invariance principle, it should be, and is, the case that for 
a standard Brownian motion we have 
Pr{B(t) reaches —a < 0 before b > 0|B(0) = 0} 
b 


= a+b (1.13) 


Finally, invoking the invariance principle for a second time, the evaluation 
in (1.12) should hold approximately for an arbitrary partial sum process, 
provided only that the independent and identically distributed summands 
have zero means and unit variances. 


1.4. Gaussian Processes 


A random vector X,, .. ., X,, 18 said to have a multivariate normal dis- 
tribution, or a joint normal distribution, if every linear combination 
aX, + -:- + a,X,, a; real, has a univariate normal distribution. Obvi- 
ously, if X,,..., X, has a joint normal distribution, then so does the ran- 
dom vector Y,,..., Y,,, defined by the linear transformation in which 


Y= a,X,+ °°: + a,X,, forj=1,...,m, 


jn 


for arbitrary constants q,,. 

The multivariate normal distribution is specified by two parameters, the 
mean values ; = E[X;] and the covariance matrix whose entries are 
I;, = Cov[X,, X,]. In the joint normal distribution, Ij; = 0 is sufficient to 
imply that X; and X; are independent random variables. 

Let T be an abstract set and {X(t); t in T} a stochastic process. We call 
{X(t); tin T} a Gaussian process if for every n = 1, 2,... and every fi- 
nite subset {t,,..., ¢,} of 7, the random vector (X(t,), ..., X(t,)) has a 
multivariate normal distribution. Equivalently, the process is Gaussian if 
every linear combination 


a,X(t,) + °°: + a,X(t,), a, real, 
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has a univariate normal distribution. Every Gaussian process is described 
uniquely by its two parameters, the mean and covariance functions, given 
respectively by 


w(t) = E[X(o)], tin T, 
and 


I(s, t) = El{X(s) — mM) {XO — vO}, — s, tin T. 


The covariance function is positive definite in the sense that for every 
n= 1,2,...,real numbers a,,..., a@,, and elements ¢,,..., t, in T, 


n n 


>» > aa T(t, t) = 0. 


i=1 j=! 


One need only evaluate the expected value of (/;_, a,{X(t) — m(t)}* = 0 
in terms of the covariance function in order to verify this. 

Conversely, given an arbitrary mean value function p(t) and a positive 
definite covariance function I'(s, ft), there then exists a corresponding 
Gaussian process. Brownian motion is the unique Gaussian process hav- 
ing continuous trajectories, zero mean, and covariance function (1.10). 
We shall use this feature, that the mean value and covariance functions de- 
fine a Gaussian process, several times in what follows. 

We have seen how the invariance principle leads to the Gaussian 
process called Brownian motion. Gaussian processes also arise as the lim- 
its of normalized sums of independent and identically distributed random 
functions. To sketch out this idea, let & (7), &(), ... be independent and 
identically distributed random functions, or stochastic processes. Let 
p(t) = ELé(t)] and ['(s, t) = Cov[&(s), &(t)] be the mean value and co- 
variance functions, respectively. Motivated by the central limit theorem, 
we define | 


int (&(t) — B(A)} 
VN 


The central limit theorem tells us that the distribution of X,(t) converges 
to the normal distribution for each fixed time point rt. A multivariate ex- 
tension of the central limit theorem asserts that for any finite set of time 
points (¢,,..., ¢,), the random vector 


(Xy(t), -- - s Xv(tn)) 


Xy(t) = 
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has, in the limit for large N, a multivariate normal distribution. It is not 
difficult to believe, then, that under ordinary circumstances, the stochastic 
processes {X,(t); t = 0} would converge, in an appropriate sense, to a 
Gaussian process {X(t); t = 0} whose mean is zero and whose covariance 
function is I'(s, f). We call this the central limit principle for random func- 
tions. Several instances of its application appear in this chapter, the first of 
which is next. 


Example Cable Strength Under Equal Load Sharing Consider a 
cable constructed from N wires in parallel. Suspension bridge cables are 
usually built this way. A section of the cable is clamped at each end and 
elongated, by increasing the distance between the clamps. The problem is 
to determine the maximum tensile load that the cable will sustain in terms 
of the probabilistic and mechanical characteristics of the individual wires. 

Let L, be the reference length of an unstretched unloaded strand of wire, 
and let L be the length after elongation. The nominal strain is defined to 
be t = (L — L,)/Ly. Steadily increasing t causes the strand to stretch and 
exert a force &(t) on the clamps, up to some random failure strain @, at 
which point the wire breaks. Hooke’s law of elasticity asserts that the wire 
force is proportional to wire strain, with Young’s modulus K as the pro- 
portionality constant. Taken all together, we write the force on the wire as 
a function of the nominal strain as 


Kt, forO=t< , 


1.14 
0 for St. ( ) 


an =| 


A typical load function is depicted in Figure 1.2 on the next page. 
We will let F(x) = Pr{f = x} be the cumulative distribution function of 
the failure strain. We easily determine the mean load on the wire to be 


w(t) = Elé(t)] = ELKtl{t < ¢}] = AKt[1 — FO). (1.15) 
The higher moments are, forO < s < f, 
Elé(s)é()] = K*stE(U{s < ghl{t < 23] 
= K’st{1 — F(o)] 
and 
I(s, t) = El&(s)é] — EL&s)JELEO) 
= K*stF(s)[1— F()), forO<s<t. (1.16) 
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Load 
E (t) 


C Strain t 


Figure 1.2 The load on an elastic wire as a function of nominal strain. At a 
strain of ¢ the wire fails, and the load carried drops to zero. 


Turning to the cable, if it is clamped at the ends and elongated, then each 
wire within it is elongated the same amount. The total force $,(¢) on the 
cable is the sum of the forces exerted by the individual wires. If we as- 
sume that the wires are independent and a priori identical, then these wire 
forces &(t), &(t), .... are independent and identically distributed random 
functions, and 


N 
Sy) = > EW) 
i=] 


is the random load experienced by the cable as a function of the cable 
strain. An illustration when N = 5 is given in Figure 1.3. 

We are interested in the maximum load that the cable could carry with- 
out failing. This is 


O, = max{S,(t); t 2 0}. 
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Maximum sustainable 
cable load 


Cable Load S(t) 


Cable Strain t 


Figure 1.3 The load experienced by a cable composed of five elastic wire 
strands as a function of the cable strain t¢. 


To obtain an approximation to the distribution of Q, we apply the central 
limit principle for random functions. This leads us to believe that 


Sv(t) — Nu 
VN 


should, for large N, be approximately a Gaussian process X(t) with mean 


zero and covariance function given by (1.16). We write this approxima- 
tion in the form 


Xy(t) = 


S(t) = Nw(t) + VNX(2). (1.17) 


When N is large, the dominant term on the right of (1.17) is Nu(t). Let ¢* 
be the value of ¢ that maximizes p(t). We assume that ¢* is unique and that 
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the second derivative of (?) is strictly negative at t = r*. We would then 
expect that 


Oy = max Sy(t) ~ Nu(t*) + VNX(t*). (1.18) 


That is, we would expect that the cable strength would be approximately 
normally distributed with mean Ny(t*) and variance NI‘(¢*, t*). To carry 
out a numerical example, suppose that F(x) = 1 — exp{—x°}, a Weibull 
distribution with shape parameter 5. It is easily checked that r* = 1/5°? = 
0.7248, that w(t*) = 0.5934K, and I(t*, *) = (0.2792)K’. A cable com- 
posed of N = 30 wires would have a strength that is approximately nor- 
mally distributed with mean 30(0.5934)K = 17.8K and standard deviation 
0.2792V/30K = 1.5292K. 

The above heuristics can be justified, and indeed, significant refinements 
in the approximation have been made. We have referred to the approach as 
the central limit principle for random functions because we have not supplied 
sufficient details to label it a theorem. Nevertheless, we will see several more 
applications of the principle in subsequent sections of this chapter. 


Table 8.1. The cumulative normal distribution 


x 


l 2 
D(x) = | Vane du 


x D(x) 
—3 0.00135 
2 0.02275 
—] 0.1587 

0 0.5000 
1 0.8413 
2 0.97725 
3 0.99865 
—2.326 0.01 
—1.96 0.025 
—1.645 0.05 
—1.282 0.10 
1.282 0.90 
1.645 0.95 
1.96 0.975 
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Exercises 


1.1. Let {B(t); t = 0} be a standard Brownian motion. 


(a) Evaluate Pr{B(4) = 3/B(0) = 1}. 
(b) Find the number c for which Pr{B(9) > c|B(0) = 1} = 0.10. 


1.2. Let {B(t); t = 0} be a standard Brownian motion and c > 0 a con- 
stant. Show that the process defined by W(t) = cB(t/c’) is a standard 
Brownian motion. 


1.3. | 
(a) Show that 


GOO) iy 
x = (x) x(x), 


where (x) is given in (1.5). 


(b) Use the result in (a) together with the chain rule of differentiation 
to show that 


] ( y- *) 
; f = ; _ = eo 
p(y, tx) = $0 — x) = dl 
satisfies the diffusion equation (1.3). 
1.4. Consider a standard Brownian motion {B(t); t = 0} at times 


O<u<utv<utyv+tw, where uy, v, w > 0. 


(a) Evaluate the product moment E[B(uw)B(u + v)B(u + v + w)]. 
(b) Evaluate the product moment 


E(B(u) Bu + v)Btuu+v+w)Btut+v+wt x) 


where x > 0. 


1.5. Determine the covariance functions for the stochastic processes 


(a) U(t) = e'B(e”’), for t = 0. 
(b) Vit) = A — DBC — dD), forO <t< 1. 
(c) Wt) = tB(1/t), with W(0) = 0. 


B(t) is standard Brownian motion. 
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1.6. Consider a standard Brownian motion {B(‘); t = 0} at times 
O<u<utv<utv+t w, where u, v, w > 0. 


(a) What is the probability distribution of B(u) + B(u + v)? 
(b) What is the probability distribution of Bu) + Blu + v) + 
Bu+v+w)? 


1.7. Suppose that in the absence of intervention, the cash on hand for a 
certain corporation fluctuates according to a standard Brownian motion 
_ {B(@; t = 0}. The company manages its cash using an (s, S) policy: If the 
cash level ever drops to zero, it is instantaneously replenished up to level 
s; If the cash level ever rises up to S, sufficient cash is invested in long- 
term securities to bring the cash-on-hand down to level s. In the long run, 
what fraction of cash interventions are investments of excess cash? Hint: 
Use equation (1.13). 


Problems 
1.1. Consider the simple random walk 
So=G te + &, So = 0, 


in which the summands are independent with Pr{é = +1} = 3. In III, Sec- 
tion 5.3, we showed that the mean time for the random walk to first reach 
—-a<Oorb> 01s ab. Use this together with the invariance principle to 
show that E[T] = ab, where 


T = T,, = min{t = 0; B(t) = —a or Bt) = 5}, 


and B(t) is standard Brownian motion. Hint: The approximate Brownian 
motion (1.11) rescales the random walk in both time and space. 


1.2. Evaluate E[e**”] for an arbitrary constant A and standard Brownian 
motion B(f). 


1.3. For a positive constant €, show that 
Bet 
pe{ Bol > | = 2{1 — Bev}. 


How does this behave when ¢ is large (t ©)? How does it behave when 
tis small (t ~ 0)? 
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1.4. Leta,,..., q@, be real constants. Argue that 
n 
>, a B(t) 
i=] 


is normally distributed with mean zero and variance 
n n 
2 2 aja, min{t,, t;} 
i=] j=l 


1.5. Consider the simple random walk 


S,=& +-:: + &, So = 9, 


in which the summands are independent with Pr{é = +1} = 5. We are 
going to stop this random walk when it first drops a units below its max- 
imum to date. Accordingly, let 


M, = max §,, Y, = M, — S,, and 


OskSn 
T= 7, = min{n = 0; ¥, = a}. 


(a) Use a first step analysis to show that 


Pr{M, = 0} = lea 


(b) Why is Pr{M, = 2} = Pr{M, = 1}?’, and 


a k 
Pr(M, = k} = (—“-) 9 


Identify the distribution of M,. 

Let B(t) be standard Brownian motion, M(t) = max{B(u); 
0O=ust}, Yt) = Md — BO, and t= min{t = 0; Y(t) = a}. Use 
the invariance principle to argue that M(r) has an exponential dis- 
tribution with mean a. Note: 7 is a popular strategy for timing the 
sale of a stock. It calls for keeping the stock as long as it is going 
up, but to sell it the first time that it drops a units from its best price 
to date. We have shown that E[M(7)] = a, whence E[B(7)] = 
E[M(r)] — a = 0, so that the strategy does not gain a profit, on av- 
erage, in the Brownian motion model for stock prices. 


(c 


~~ 
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1.6. Manufacturers of crunchy munchies such as cheese crisps use 
compression testing machines to gauge product quality. The crisp, or 
whatever, is placed between opposing plates, which then move together. 
As the crisp is crunched, the force is measured as a function of the dis- 
tance that the plates have moved. The output of the compression testing 
machine is a graph of force versus distance that is much like Figure 1.3. 
What aspects of the graph might be measures of product quality? Model 
the test as a row of tiny balloons between parallel plates. Each single bal- 
loon might follow a force—distance behavior of the form 0 = Ke(1 — q(e)), 
‘where o is the force, K is Young’s modulus, e is strain or distance, and 
q(e) is a function that measures departures from Hooke’s law, to allow for 
soggy crisps. Each balloon obeys this relationship up until the random 
strain £ at which it bursts. Determine the mean force as a function of 
strain. Use F(x) for the cumulative distribution function of failure strain. 


1.7. Forn =0,1,..., show that (a) B(n) and (b) B(n)? — n are martin- 
gales (see II, Section 5). 


1.8. Computer Challenge A problem of considerable contemporary 
importance is how to simulate a Brownian motion stochastic process. The 
invariance principal provides one possible approach. An infinite series ex- 
pression that N. Wiener introduced may provide another approach. Let Z,, 
Z,, ... be a Series of independent standard normal random variables. The 
infinite series 


t x 
Bit) = —=Z+ J/- Zan , 
( ) Var ° T Dy m 
is a standard Brownian motion for 0 = t S 1. Try to simulate a Brownian 


motion stochastic process, at least approximately, by using finite‘sums of 
the form 


sin mt, 

By) = Zn» Osr=l. 
. Wee => 

If B(t), O = t = 1, is a standard Brownian motion on the interval [0, 1], 

then B’(t) = (1 + NBU/( + +d), 0 St < &, is a standard Brownian mo- 

tion on the interval [0, ). This suggests 
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By) = (1 + »By(——}, 0O=t<o, 

as an approximate standard Brownian motion. In what ways do these fi- 
nite approximations behave like Brownian motion? Clearly, they are zero 
mean Gaussian processes. What is the covariance function, and how does 
it compare to that of Brownian motion? Do the gambler’s ruin probabili- 
ties of (1.13) accurately describe their behavior? It is known that the 
squared variation of a Brownian motion stochastic process is not random, 
but constant: 


k k-1 
im PO) a) 
‘ rae 2. | n n 


This is a further consequence of the variance relation E[(AB)*] = At (see 
the remark in Section 1.2). To what degree do the finite approximations 
meet this criterion? 


2 


= 1. 


2. The Maximum Variable and the 
Reflection Principle 


Using the continuity of the trajectories of Brownian motion and the sym- 
metry of the normal distribution, we will determine a variety of interest- 
ing probability expressions for the Brownian motion process. The starting 
point is the reflection principle. 


2.1. The Reflection Principle 


Let B(t) be a standard Brownian motion. Fix a value x > 0 and a time 
t > 0. Bearing in mind the continuity of the Brownian motion, property 
(c) of the definition, consider the collection of sample paths B(u) for 
u = 0 with B(O) = 0 and for which B(t) > x. Since B(u) is continuous and 
B(O) = 0, there exists a time 7, itself a random variable depending on the 
particular sample trajectory, at which the Brownian motion B(u) first at- 
tains the value x. 
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We next describe a new path B*(u) obtained from B(u) by reflection. 
For u > 7, we reflect B(u) about the horizontal line at height x > 0 to 
obtain 


Bu), foru =f, 
x — [B(u) — x], for u > T. 


B*¥(u) = | 


_ Figure 2.1 illustrates the construction. Note that B*(t) < x because 
B(t) > x. 

Because the conditional probability law of the path for u > 7, given that 
B(t) = x, is symmetric with respect to the values y > x and y < x, and in- 


B(u) 


Figure 2.1 The path B(u) is reflected about the horizontal line at x, showing 
that for every path ending at B(t) > x, there are two paths, B(u) 
and B*(u), that attain the value x somewhere in the interval 
O=ustt. 
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dependent of the history prior to 7,' the reflection argument displays for 
every sample path with B(t) > x two equally likely sample paths, B(u) and 
B*(u), for which both 


max B(u) > x and max B*¥(u) > x. 


Osust 


Conversely, by the nature of this correspondence, every sample path B(u) 
for which maxy<,<, B(u) > x results from either of two equally likely sam- 
ple paths, exactly one of which is such that B(t) > x. The two-to-one 
correspondence fails only if B(t) = x, but because B(t) is a continuous ran- 
dom variable (normal distribution), we have Pr{B(t) = x} = 0, and this 
case can be safely ignored. Thus we conclude that 


Pr{ max B(u) > x| = 2Pr{B(t) > x}. 


In terms of the maximum process defined by 
M(t) = max Bu), (2.1) 


and using the notation set forth in (1.8), we have 


Pr{M(t) > x} = 2[1 — B)]. (2.2) 


2.2. the Time to First Reach a Level 


With the help of (2.2) we may determine the probability distribution of the 
random time 7, at which the Brownian motion first attains a prescribed 
value x > 0 starting from B(0) = 0. Formally, define the hitting time 


T, = min{u = 0; B(u) = x}. (2.3) 


Clearly, 7, = tif and only if M(t) = x. In words, the Brownian motion at- 
tains the level x > 0 before time ¢ if and only if at time ¢ the maximum of 


' The argument is not quite complete because the definition asserts that an increment in 
a Brownian motion after a fixed time ¢ is independent of the past, whereas here we are 
restarting from the random time 7. While the argument is incomplete, the assertion is true: 
A Brownian path begins afresh from hitting times such as T. 
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the process is at least x. If the two events are equivalent, then their prob- 
abilities must be the same. That is, 
Pr{z, = t} = Pr{M(t) = x} = 2[1 —- ®W)] 
2 f , (2.5) 
= — ~ €°1(21) d 
Vim J “ é 


The change of variable € = 7V‘t leads to 


Pr{7, St} = = | e7 7? dy. (2.6) 


xlVt 


The probability density function of the random time 7 is obtained by dif- 
ferentiating (2.6) with respect to ¢, giving 


t 3/2 


f(t) = wae for 0<1t<~, (2.7) 


2.3. The Zeros of Brownian Motion 


As a final illustration of the far-reaching consequences of the reflection 
principle and equation (2.7), we will determine the probability that a stan- 
dard Brownian motion B(t), with B(O) = 0, will cross the ¢ axis at least 
once in the time interval (t, t + s] for t, s > 0. Let us denote this quantity 
by O(t, t + s). The result is 


O(t, t + s) = Pr{B(u) = O for some u in (t, t + s]} 


2 
= — arctanV s/t 
aT (2.8) 


2 
= — arccosV t(t + s). 
aT 


First, let us define some notation concerning the hitting time 7, defined in 
(2.3). Let 
H(z, x) = Pr{z, = |B) = 2} 


be the probability that a standard Brownian motion starting from B(O) = z 
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will reach the level x before time f. In equation (2.5) we gave an integral 
that evaluated 


H(0, x) = Pr{z, = 4/B(0) = 0}, forx > 0. 


The symmetry and spatial homogeneity of the Brownian motion make it 
clear that H,(0, x) = H,(x, 0). That is, the probability of reaching x > 0 
starting from B(O) = O before time t is the same as the probability of 
reaching 0 starting from B(O) = x. Consequently, from (2.7) we have 


H(0, x) = H,(x, 0) = Pr{7 = 4/B(O) = x} 


t 


x , (2.9) 
=_ Ee ICO dé. 
V 27 


We will condition on the value of the Brownian motion at time ¢ and use 
the law of total probability to derive (2.8). Accordingly, we have 

O(t,t+s)= | Pr{B(u) = 0 for some u in (t, t + s]|B(t) = x} (x) dx, 
where @,(x) is the probability density function for B(t) as given in (1.7). 
Then, using (2.9), 


O(t,t +s) = 2 | Hx, 0)6,(x) dx 
0 


r (fx , ] : 
~ 2 | {= - 329-8128) J -?20 dy 
} Vine ° am 


0 


$ 
0 


1 f(f ao 
-—. | [ xerPen-veen is £3 ge 
0 


To evaluate the inner integral, we let 


whence 
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and so 


O(t,t +s) = — { ' + 1h ve’ aves dé 
a) 


| 0 
The change of variable n = V &t gives 


Vit 
2 
O(t,t +s) =— | 


7 9 


dn 
1+ 7 


2 
= — arccosV t/(t + s). 
7 


Finally, Exercise 2.2 asks the student to use standard trigonometric iden- 
tities to show the equivalence arctan’ s/t = arccosV t/(t + 5). 


Exercises 


2.1. Let {B(t); t = 0} be a standard Brownian motion, with B(O) = 0, 
and let M(t) = max{B(u);0 =u St}. 


(a) Evaluate Pr{M(4) = 2}. 
(b) Find the number c for which Pr{M(9) > c} = 0.10. 
2.2. Show that 
arctan\/s/t = arccosV/tl(s + 1). 
2.3. Suppose that net inflows to a reservoir are described by a standard 
Brownian motion. If at time 0 the reservoir has x = 3.29 units of water on 
hand, what is the probability that the reservoir never becomes empty in 
the first t = 4 units of time? 
2.4. Consider the simple random walk 
S,= & te: + &, So = 0, 


in which the summands are independent with Pr{é = +1} = 
M,, = maxoz;<, 5, Use a reflection argument to show that 


Pr{M, = a} = 2Pr{S, > a} + Pr{S, = a}, a> 0. 


Let 


Nl=— 
e 
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2.5. Let 7, be the largest zero of a standard Brownian motion not ex- 
ceeding a > 0. That is, 7 = max{u = 0; B(u) = 0 and u = a}. Show that 


2 
Pr{7 <t} = - arcsiny t/a. 


2.6. Let 7, be the smallest zero of a standard Brownian motion that ex- 
ceeds b > 0. Show that 


2 
Pr{7,<t} = - arccosY b/t. 


Problems 


2.1. Find the conditional probability that a standard Brownian motion is 
not zero in the interval (t, t + b] given that it is not zero in the interval 
(t,t + a], whereO <a< bandt> 0. 


2.2. Find the conditional probabliity that a standard Brownian motion 1s 
not zero in the interval (0, b] given that it is not zero in the interval (0, a], 
where 0 < a < b. Hint: Let t > 0 in the result of Problem 2.1. 


2.3. Fora fixed t > 0, show that M(t) and |B(t)| have the same marginal 
probability distribution, whence 


for z>0. 


hud) = SAG) 

(Here M(t) = maxoz,<,B(u).) Show that 
E(M(t)] = V20/tr. 

For 0 < s < t, do (M(s), M(t)) have the same joint distribution as 


2.4. Use the reflection principle to obtain 


Pr{M(t) = z, B(t) = x} = Pr{B(t) = 2z — x} 


22 -xX 


Vi 


=1- for O<x<m. 
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(M(t) is the maximum defined in (2.1).) Differentiate with respect to x, and 
then with respect to z, to obtain the joint density function for M(t) and 
B(t): 


2z-x 2 2zZ — 
Fw .2(2; x) = <—* al a) 


t 


2.5. Show that the joint density function for M(t) and Y(t) = M(t) — B(t) 
is given by 


_zty 2 fzty 
Fu. vo(Zs Y) = ; = Vi | 


2.6. Use the result of Problem 2.5 to show that Y(t) = M(t) — B(t) has 
the same distribution as |B(1)|. 


3. Variations and Extensions 


A variety of processes derived from Brownian motion find relevance and 
application in stochastic modeling. We briefly describe a few of these. 


3.1 Reflected Brownian Motion 


Let {B(t); t = 0} be a standard Brownian motion process. The stochastic 
process 


| B(), if B(t) = 0, 
RO=|BOI= pw it B) <0, 
is called Brownian motion reflected at the origin, or, more briefly, re- 
flected Brownian motion. Reflected Brownian motion reverberates back to 
positive values whenever it reaches the zero level and, thus, might be used 
to model the movement of a pollen grain in the vicinity of a container 
boundary that the grain cannot cross. 

Since the moments of R(t) are the same as those of |B(t) , the mean and 
variance of reflected Brownian motion are easily determined. Under the 
condition that R(O) = 0, for example, we have 
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oc 


E(R(O] = | |x\b(@) a 


(3.1) 


r x 
— _— v2 
= 2 | 7 exp(—x*/2t) dx 


The integral was evaluated through the change of variable y = x/\/t. Also, 
Var[R(1)] = E[LR()*] — {E[RI}? 


= E[B(t)’] — 2t/a (3.2) 


(1-2) 


Reflected Brownian motion is a second example of a continuous-time, 
continuous-state-space Markov process. Its transition density p(y, t|x) 1S 
derived from that of Brownian motion by differentiating 


Pr{R(t) = y|R(O) = x} = Pr{—y S Bit) S y|B(O) = x} 
y 
= | f(z ~ x) dz 
—~y 
with respect to y to get 
ply, tx) = dy — x) + o(-y - x) 
= d,(y ~~ x) + d,(y + x). 


3.2. Absorbed Brownian Motion 


Suppose that the initial value B(O) = x of a standard Brownian motion 
process is positive, and let 7 be the first time that the process reaches zero. 
The stochastic process 


AQ) = ia for t = 7, 
() = 0 fort > T 
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is called Brownian motion absorbed at the origin, which we will shorten 
to absorbed Brownian motion. Absorbed Brownian motion might be used 
to model the price of a share of stock in a company that becomes bank- 
rupt at some future instant. We can evaluate the transition probabilities for 
absorbed Brownian motion by another use of the reflection principle in- 
troduced in Section 2. For x > 0 and y > Q, let 


G(x, y) = Pr{A(#) > y|A(O) = x} 

= Pr{B(t) > y, minys,<,B(u) > 0|B(O) = x}. 6) 
To determine (3.3), we start with the obvious relation 
Pr{B(t) > y|B(O) = x} = 


G,(x, y) + Pr{B(t) > y, ming <,<,B(u) = 0|B(O) = x}. 


Btu) 


Figure 3.1 For every path B(u) starting at x, ending at B(t) > y, and reaching 
zero in the interval, there is another path B*(u) starting at x and 
ending at B*(t) < —y. 
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The reflection principle is applied to the last term; Figure 3.1, previous 
page, is the appropriate picture to guide the analysis. We will argue that 


Pr{B(t) > y, mine, <,B(u) = 0|B(0) = x} 
= Pr{B(t) < —y, minye,<,B) S 0|B(0) = x} (3.4) 
= Pr{B(t) < —y|B(O) = x} = ©(—y — »). 


The reasoning behind (3.4) goes as follows: Consider a path starting at 
x > 0, satisfying B(t) > y, and which reaches zero at some intermediate 
time 7. By reflecting such a path about zero after time 7, we obtain an 
equally likely path starting from x and assuming a value below —y at time 
t. This implies the equality of the first two terms of (3.4). The equality of 
the last terms is clear from their meaning, since the condition that the min- 
imum be below zero is superfluous in view of the requirement that the 
path end below —y (y > Q). Inserting (3.4) into (3.3) yields 


G,(x, y) = Pr{B(t) > y|B(O) = x} — Pr{B(t) < —y|B(O) = x} 
= 1—- @( — x) — O(-0% + x) 


= D(y + x) — By — x) (3.5) 
- fea de= 952) - 0S) 


From (3.3) and (3.5), we obtain the transition distribution for absorbed 
Brownian motion: 


Pr{A(t) > y|A(O) = x} = o(? veal - o(2 +) (3.6) 


Under the condition that A(0) = x > 0, A(#) is a random variable that has 
both discrete and continuous parts. The discrete part 1s 


Pr{A(t) = O|A(O) = x} = 1 — G(x, 0) 


1 — i (2) az 


= 2[1 — B(x)]. 
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In the region y > 0, A(?) is a continuous random variable whose transition 
density p(y, tx) is obtained by differentiating with respect to y in (3.6) and 
suitably changing the sign: 


pty, tx) = by — x) — $y + 2»). 


3.3. The Brownian Bridge 


The Brownian bridge {B°(t); t = 0} is constructed from a standard Brown- 
ian motion {B(t); t = 0} by conditioning on the event {B(0) = B(1) = 0}. 
The Brownian bridge is used to describe certain random functionals aris- 
ing in nonparametric statistics, and as a model for the publicly traded 
prices of bonds having a specified redemption value on a fixed expiration 
date. 

We will determine the probability distribution for B°(t) by using the 
conditional density formula for jointly normally distributed random vari- 
ables derived in II, Problem 4.8. First, for 0 < t < 1, the random variables 
B(t) and B(1) — B(¢) are independent and normally distributed according 
to the definition of Brownian motion. It follows that X = B(t) and 
Y = BC) = BY) + {BC(1) — Bt)} have a joint normal distribution (See I, 
Section 4.6) for which we have determined pr, = py = 0, oy = Vt, o, = l, 
and p = Cov[X, Y]/o,a, = Vt. Using the results of II, Problem 4.8, it then 
follows that given Y = B(1) = y, the conditional distribution of X = B(t) 
is normal, with 

po: 


~~ Hy) = yVt=0 when y= 0, 
Y 


My y = By + 


and 


Oyy=OxV1—- p= Vil — 0). 


For the Brownian bridge, B°(t) is normally distributed with E[B°(t)] = 0 
and Var[B(t)] = t(1 — 1). Notice how the condition B(O) = B(1) = 0 
causes the variance of B°(t) to vanish at t = 0 andt = 1. 

The foregoing calculation of the variance can be extended to determine 
the covariance function. Consider times s, t withO < s<t< 1. By first 
obtaining the joint distribution of (B(s), B(t), B(1)), and then the condi- 
tional joint distribution of (B(s), B(t)), given that B(1) = 0, one can verify 
that the Brownian bridge is a normally distributed stochastic process with 


3. Variations and Extensions 503 


mean zero and covariance function I'(s, rf) = Cov[B%s), B°(1)] = s(1 — 0), 
for0 <s<t<_1. (See Problem 3.3 for an alternative approach.) 


Example The Empirical Distribution Function Let X,, X,, ... be in- 
dependent and identically distributed random variables. The empirical 
cumulative distribution function corresponding to a sample of size N is 
defined by 


1 
Fut) = A, S thor = 1,...,N} 


12 (3.7) 
~ N 2, &,(t), 


where 
" if X; St, 
SO 1o  ifX >t. 


The empirical distribution function is an estimate, based on the observed 
sample, of the true distribution function F(t) = Pr{X = t}. We will use the 
central limit principle for random functions (Section 1.4) to approximate 
the empirical distribution function by a Brownian bridge, assuming that 
the observations are uniformly distributed over the interval (0, 1). (Prob- 
lem 3.9 calls for the student to explore the case of a general distribution.) 
In the uniform case, F(t) = t for0 <t< 1, and w(t) = E[€&()] = F(d = t. 
For the higher moments, when 0 < s < ¢t < 1, E[&(s)&()] = F(s) = s, 
and I's, t) = Cov[&s), €()] = ElE(s)€()] — ELE(s)JELE)] = 5 — st = 
si — 2). 

In view of (3.7), which expresses the empirical distribution function in 
terms of a sum of independent and identically distributed random func- 
tions, we might expect the central limit principle for random functions to 
yield an approximation in terms of a Gaussian limit. Following the guide- 
lines in Section 1.4, we would expect that 


a1 (6(t) — wd} 
VN 


_ NFy(t) — Nt 
ON 


= VN{Ft) — ¢} 


Xy(t) = 
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would converge, in an appropriate sense, to a Gaussian process with zero 
mean and covariance ['(s, t) = s(1 — 4), forO <s <t< 1. As we have 
just seen, this process is a Brownian bridge. Therefore, we would expect 
the approximation 


1 
F(t) =tt+ Bt), 0<t< 1. 
v(t) VN (t) 


Such approximations are heavily used in the theory of nonparametric 
Statistics. 


3.4. Brownian Meander 


Brownian meander {B*(t); t = 0} is Brownian motion conditioned to be 
positive. Recall (3.3) and (3.5): 


G(x, y) = Pr{B(t) > y, minge,<, B(u) > 0|B(O) = x} 
ytx y-xXx 
= o> Vi )- of? a} 
G(x, 0) = Pr{miny.,<, B(u) > 0|B(O) = x} 


= 47) - 477) 


The transition law for Brownian meander is 


so that 


Pr{B*(t) > y|B*(0) = x} = Pr{B(t) > ylminy.,<, B(u) > 0, B(0) = x}, 
whence 


G(x, y) 


Pr{B*(t) > y|B*(0) = x} = G (a, 0) 
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Of most interest is the limiting case as x — 0, 


a8) of 


Wve) MVE 
Pr{B*(t) > y|B*(0) = 0} = lim o/ = - o/ =) 
7) 
~ (0) 
= earl. 


4 


A simple integration yields the mean 


E[B*(#)|B*(0) = 0] = | Pr(B*(@) > y|B*(O) = 0} dy 
0 


= | ert dy 
0 

= 1\/2 mt | enn" dy 
J Vn) 

= V m1/2. 


Exercises 


3.1. Show that the cumulative distribution function for reflected Brown- 
1an motion is 


Pr{R(t) < y|R(0) = x} = o(? y *) - o “Vi *] 


Vi Vt 
“oe 


-S)-o) 


Evaluate this probability when x = 1, y = 3, andt = 4. 
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3.2. The price fluctuations of a share of stock of a certain company are 
well described by a Brownian motion process. Suppose that the company 
is bankrupt if ever the share price drops to zero. If the starting share price 
is A(O) = 5, what is the probability that the company is bankrupt at time 
t = 25? What is the probability that the share price is above 10 at time 
t = 25? 


3.3. The net inflow to a reservoir is well described by a Brownian mo- 
tion. Because a reservoir cannot contain a negative amount of water, we 
suppose that the water level R(?) at time ¢ is a reflected Brownian motion. 
What is the probability that the reservoir contains more than 10 units of 
water at time tf = 25? Assume that the reservoir has unlimited capacity and 
that R(O) = 5. 


3.4. Suppose that the net inflows to a reservoir follow a Brownian mo- 
tion. Suppose that the reservoir was known to be empty 25 time units ago 
but has never been empty since. Use a Brownian meander process to eval- 
uate the probability that there is more than 10 units of water in the reser- 
voir today. 


3.5. Is reflected Brownian motion a Gaussian process? Is absorbed 
Brownian motion (cf. Section 1.4)? 


Problems 


3.1. Let B,(t) and B,(t) be independent standard Brownian motion 
processes. Define 


R(t) = VB (ty + BA’, t=0. 
R(t) is the radial distance to the origin in a two-dimensional Brownian mo- 


tion. Determine the mean of R(t). 


3.2. Let B(t) be a standard Brownian motion process. Determine the 
conditional mean and variance of B(t), 0 < t < 1, given that B(1) = b. 


3.3. Let B(t) be a standard Brownian motion. Show that B(u) — uB(1), 
0 <u < 1, is independent of B(1). 
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(a) Use this to show that B°(t) = B(t) — tB(1),0 St <= 1, is a Brown- 
ian bridge. 

(b) Use the representation in (a) to evaluate the covariance function for 
a Brownian bridge. 


3.4. Let B(t) be a standard Brownian motion. Determine the covariance 
function for 


Ws) =(1- »o(——}, 0<s<l, 


and compare it to that for a Brownian bridge. 


‘ 


3.5. Determine the expected value for absorbed Brownian motion A(t) 
at time t = 1 by integrating the transition density (3.6) according to 


E[A(1)|A(0) = x] 


| y p(y, 1]x) dy 
0 


[ st0 - 9 - 0 + day. 
0 


The answer is E[A(1)|A(0) = x] = x. Show that E[A(1)|A(0) = x] = x for 
all t > 0. 


3.6. Let M = max{A(t); t = 0} be the largest value assumed by an ab- 
sorbed Brownian motion A(t). Show that Pr{M > z\A(0) = x} = x/z for 
O<x<z. 


3.7. Leth =O<t, <t,<... be time points, and define X, = A(t), 
where A(t) is absorbed Brownian motion starting from A(Q) = x. Show 
that {X,} is a nonnegative martingale. Compare the maximal inequality 
(5.7) in II with the result in Problem 3.6. 


3.8. Show that the transition densities for both reflected Brownian mo- 
tion and absorbed Brownian motion satisfy the diffusion equation (1.3) in 
the region0 <x < &, 


3.9. Let F(t) be a cumulative distribution function and B°(u) a Brown- 
ian bridge. 
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(a) Determine the covariance function for B°(F(t)). 

(b) Use the central limit principle for random functions to argue that 
the empirical distribution functions for random variables obeying 
F(t) might be approximated by the process in (a). 


4. Brownian Motion with Drift 


Let {B(t); t = 0} be a standard Brownian motion process, and let and 
o > 0 be fixed. The Brownian motion with drift parameter pw and variance 
parameter ao” is the process 


X(t) = pt+ oB(t) for t=O. (4.1) 


Alternatively, Brownian motion with drift parameter uz and variance para- 
meter a” is the process whose increments over disjoint time intervals are 
independent (property (b) of the definition of standard Brownian motion) 
and whose increments X(t + s) — X(t), t, s > O, are normally distributed 
with mean ys and variance a’s. When X(0) = x, we have 


Pr{X(t) = y|X(0) = x} = Pr{ut + oB(t) < yloB(O) = x} 


= Pr BO < 2— #150) = =| 


=f =H = of 


Brownian motion with drift 1s not symmetric when yx + 0, and the reflec- 
tion principle cannot be used to compute the distribution of the maximum 
of the process. We will use an infinitesimal first step analysis to determine 
some properties of Brownian motion with drift. To set this up, let us in- 
troduce some notation to describe changes in the Brownian motion with 
drift over small time intervals of length At. We let AX = X(t + At) — X(t) 
and AB = B(t + At) — B(t). Then AX = pAt + GAB, and 


X(t + At) = X(t) + AX = X(t) + pAt + GAB. (4.2) 
We observe that the conditional moments of AX, given X(t) = x, are 
E[AX|X(t) = x] = wAt + cE[AB] = pAt, (4.3) 


Var[AX|X(t) = x] = o?E[(AB)’] = o°At, (4.4) 
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and 
E[(AX)|X(t) = x] = o? At + (uAt? = o?At + o(At, —(4.5) 
while 


E{(AX)] = o(At) for c> 2. (4.6) 


4.1. The Gambler’s Ruin Problem 


Let us suppose that X(O) = x, and that a < x and b > x are fixed quanti- 
ties. We will be interested in some properties of the random time T at 
which the process first assumes one of the values a or b. This so-called hit- 
ting time 1s formally defined by 


T = T,, = min{t = 0; X(t) = a or X(t) = D}. 


Analogous to the gambler’s ruin problem in a random walk (III, Section 
5.3), we will determine the probability that when the Brownian motion 
exits the interval (a, b), it does so at the point b. The solution for a stan- 
dard Brownian motion was obtained in Section 1 by using the invariance 
principle. Here we solve the problem for Brownian motion with drift by 
instituting an infinitesimal first step analysis. 


Theorem 4.1. For a Brownian motion with drift parameter ys and vari- 
ance parameter a’, anda<x <b, 


en ula? _ eo 2Halo? 


u(x) = Pr{X(T,,) = b|X(0) = x} = (4.7) 


a) 2 _,) 2° 
e 2phlao~ __ e 2palo 


Proof Our proof is not entirely complete in that we will assume (1) that 
u(x) is twice continuously differentiable, and (ii) that we can choose a 
time increment At so small that exiting the interval (a, b) prior to time Ar 
can be neglected. With these provisos, at time At the Brownian motion 
will be at the position X(0) + AX = x + AX, and the conditional proba- 
bility of exiting at the upper point b is now u(x + AX). Invoking the law 
of total probability, it must be that u(x) = Pr{x(T) = b|X(0) = x} = 
E[Pr{X(T) = b|X(0) = x, X(At) = x + AX}|X(0) = x] = Elu(x + AX)], 
where a < x < b. 
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The next step is to expand u(x + AX) in a Taylor series, whereby 


u(x + AX) = u(x) + u'(xX)AX + gu"(x)(AXy + 0o( AX). Then 
u(x) = E[u(x + AX)] 
= u(x) + u'(x)ETAX] + 3u"(x)E[(AX)’] + E[o(A)]. 
We use (4.3), (4.5), and (4.6) to evaluate the moments of AX, obtaining 
u(x) = u(x) + u’(x)pAt + fu"(x)o?At + o(AD), 


which, after subtracting u(x), dividing by At, and letting At > 0, becomes 
the differential equation 


O= pu'(x) +350°u"(x) for a<x<b. (4.8) 
The solution to (4.8) is 
u(x) = Aew*!2" + B, 


where A and B are constants of integration. These constants are deter- 
mined by the conditions u(a) = 0 and u(b) = 1. In words, the probability 
of exiting at b if the process starts at a is zero, while the probability of ex- 
iting at b if the process starts at b is one. When these conditions are used 
to determine A and B, then (4.7) results. 


Example Suppose that the fluctuations in the price of a share of stock 
in a certain company are well described by a Brownian motion with drift 
ww = 1/10 and variance o? = 4. A speculator buys a share of this stock at 
a price of $100 and will sell if ever the price rises to $110 (a profit) or 
drops to $95 (a loss). What is the probability that the speculator sells at a 
profit? We apply (4.7) with a = 95, x = 100, b = 110, and 24/0? = 
2(0.1)/4 = 1/20. Then 


_ ) - 
100/20 _ e 95/20 


Pr{Sell at profit} = = 0.419. 


- 4 - 2 
e 110/20 e 95/20 


The Mean Time to Exit an Interval 


Using another infinitesimal first step analysis, the mean time to exit an in- 
terval may be determined for Brownian motion with drift. 
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Theorem 4.2 Fora Brownian motion with drift parameter js and vari- 
ance parameter a’, anda<x <b, 


I 
E{T,,|X(0) = x] = poe — a) — (x — a)], (4.9) 


where u(x) is given in (4.7). 


Proof Let v(x) = E[T,,|X(0) = x]. As in the proof of Theorem 4.1, we 
will assume (1) that v(x) is twice continuously differentiable, and (ii) that 
we can choose a time increment Af so small that exiting the interval (a, b) 
prior to time Ar can be neglected. With these provisos, after time Ar the 
Brownian motion will be at the position X(0) + AX = x + AX, and the 
conditional mean time to exit the interval is now At + v(x + AX). Invok- 
ing the law of total probability, it must be that v(x) = E[T\X(0) = x| = 
E{At + E{T — AtX(0) = x, X(At) = x + AX}|X(0) = x] = At + 
E[v(x + AX)], where a < x < b. 

The next step is to expand v(x + AX) in a Taylor series, whereby 
v(x + AX) = v(x) + v’(X)AX + 5v"(x)(AX)Y + 0( AX)’. Then 


v(x) = At + E[v(x + AX)] 
= At + v(x) + v'(X)E[AX] + 3v"(Q)E[(AX)’] + E[o(AX)’]. 
We use (4.3), (4.5), and (4.6) to evaluate the moments of AX, obtaining 
v(x) = At + v(x) + v’(x)pAt + 5v"(x)o° At + o( AD, 


which, after subtracting v(x), dividing by At, and letting At — 0, becomes 
the differential equation 


—1 = pv'(x) + go°v"(x) for a<x<b. (4.10) 


Since it takes no time to reach the boundary if the process starts at the 
boundary, the conditions are v(a) = v(b) = 0. Subject to these conditions, 
the solution to (4.10) is uniquely given by (4.9), as is easily verified (Prob- 
lem 4.1). 


Example A Sequential Decision Procedure A Brownian motion X(t) 
either (i) has drift u = +46 > 0, or (ii) has drift ~ = —36 < 0, and it is 
desired to determine which is the case by observing the process. The 


512 Vill Brownian Motion and Related Processes 


process will be monitored until it first reaches the level b > 0, in which 
case we will decide that the drift is ~ = +36, or until it first drops to the 
level a < 0, which occurrence will cause us to decide in favor of 4p = —36. 
This decision procedure is, of course, open to error, but we can evaluate 
these error probabilities and choose a and b so as to keep the error proba- 
bilities acceptably small. We have 


a = Pr{Decide w = +16 = —468} 
= Pr{X(T) = bIE[X(t)] = —168} 


(4.11) 
— 1l- etme 
= er ilo? _ pt bala?? (using (4.7)) 
and 
1 — B= Pr{Decide wp = +35. = +36} 
= Pr{X(T) = bIE[X(1)] = +468} (4.12) 
— l-em 
e7 dla? _ en bala? 


If acceptable levels of the error probabilities a and 8 are prescribed, then 
we can solve in (4.11) and (4.12) to determine the boundaries to be used 
in the decision procedure. The reader should verify that these boundaries 
are 


2 1 - 2 — 
a= = log( *), and b=  jog(= F (4.13) 


B a 
For a numerical example, if a? = 4 and we are attempting to decide be- 
tween « = —}and pw = +5, and the acceptable error probabilities are cho- 


sen to be a = 0.05 and B = 0.10, then the decision boundaries that should 
be used are a = —4 log(0.95/0.10) = —9.01, and b = 4 log(0.90/0.05) = 
11.56. 

In the above procedure for deciding the drift of a Brownian motion, the 
observation duration until a decision is reached will be a random variable 
whose mean will depend upon the true value of the drift. Using (4.9) with 
x = O and pw replaced by +56 gives us the mean observation interval, as a 
function of the true mean p: 
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E(T|w = —36] = (2) Ja ~ a) loz(- 3 =) a loz(- - P] 
and 


are 11=2(2)[0—ra( =) - pr] 


We have developed a sequential decision procedure for evaluating the 
drift of a Brownian motion. However, invoking the invariance principle 
leads us to believe that similar results should maintain, at least approxi- 
mately, in analogous situations in which the Brownian motion is replaced 
by a partial sum process of independent and identically distributed sum- 
mands. The result is known as Wald’s approximation for his celebrated se- 
quential probability ratio test of a statistical hypothesis. 


The Maximum of a Brownian Motion with Negative Drift 


Consider a Brownian motion with drift {X(}, where the drift parameter 
2 is negative. Over time, such a process will tend toward ever lower val- 
ues, and its maximum M = max{X(t) — X(0); t = 0} will be a well-de- 
fined and finite random variable. Theorem 4.1 will enable us to show that 
M has an exponential distribution with parameter 2|u\/o”. To see this, let 
us suppose that X(0) = 0 and that a < 0 < bare constants. Then Theorem 
4.1 states that 


1 — emo? 


Pr{X(T,,) = |X) = x} = (4.14) 


—? 2 - 29 
e 2pblo- __ e 2palo 


where 7,,, is the random time at which the process first reaches a < 0 or 
b > 0. That is, the probability that the Brownian motion reaches b > 0 be- 
fore it ever drops to a < 0 is given by the right side of (4.14). Because 
both ~ < 0 and a < 0, then ay > O and 


e~2malo® — g-2ualo* _»Q) ag q—> —0, and then 


1-0 


— 2-2 pble? 
—2pblo? =e ° 
e-- - 0 


lim Pr{X(T,,) = b} = 
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But as a + —®%, the left side of (4.14) becomes the probability that the 
process ever reaches the point b, that is, the probability that the maximum 
M of the process ever exceeds b. We have deduced, then, the desired ex- 
ponential distribution 


Pr{M > b} = e72#h2" b> 0. (4.15) 


4.2. Geometric Brownian Motion 


A stochastic process {Z(t); t = 0} is called a geometric Brownian motion 
with drift parameter a if X(t) = log Z(t) is a Brownian motion with drift 
jt = a — 30° and variance parameter o’. Equivalently, Z(t) is geometric 
Brownian motion starting from Z(0) = z if 


Z(t) = ze = Zee re FOB) (4.16) 


where B(t) is a standard Brownian motion starting from B(O) = 0. 

Modern mathematical economists usually prefer geometric Brownian 
motion over Brownian motion as a model for prices of assets, say shares 
of stock, that are traded in a perfect market. Such prices are nonnegative 
and exhibit random fluctuations about a long-term exponential decay or 
growth curve. Both of these properties are possessed by geometric 
Brownian motion, but not by Brownian motion itself. More importantly, 
if tf) << t, < +++ < t, are time points, then the successive ratios 


Ah) Lh) L(t,,) 
Z(to) Z(t)” 7 ty-1) 


are independent random variables, so that crudely speaking, the ‘percent- 
age changes over nonoverlapping time intervals are independent. 

We turn to determining the mean and variance of geometric Brownian 
motion. Let € be a normally distributed random variable with mean zero 
and variance one. We begin by establishing the formula 


Ele*] = &", —“e <A < mw, 


which results immediately from 
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r 
| = Iz Ree” du (area under normal density) 


ox 


| titan) du 
217 
—20 


e7? Efe*], 


To obtain the mean of geometric Brownian motion Z(t) = ze = 
ze(r-37°)' +28 we use the fact that € = B()/\Vt is normally distributed with 
mean zero and variance one, whence 


E[Z(1)|Z(0) = z] = zE[ets-329'* 78] 
= zero NEferV8] = (E= BY) (4.17) 
_— Zero erat = ze", 


Equation (4.17) has interesting economic implications in the case where 
@ 1S positive but small relative to the variance parameter o’. On the one 
hand, if @ is positive, then the mean E[Z(t)] = z exp(at) ~ © ast > ~. 
On the other hand, if @ is positive but a < jo’, then a — 50? < 0, and 
X(t) = (a — 30°)t + oB(t) is drifting in the negative direction. As a con- 
sequence of the law of large numbers, it can be shown that X(t) — —® as 
t — © under these circumstances, so that Z(t) = z exp[X(t)] - exp(—©) 

= 0. The geometric Brownian motion process is drifting ever closer to 
zero, while simultaneously, its mean or expected value is continually in- 
creasing! Here is yet another stochastic model in which the mean value 
function is entirely misleading as a sole description of the process. 

The variance of the geometric Brownian motion is derived in much the 
Same manner as the mean. First 


E(Z(t)’ 


z Ef e?Xin) —_ z E[ erla-re0 +208] 


= 7? 2 eats ro? )t (as in (4.17)), 


= z] 
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and then 
Var[Z(t)] = E[Z()’] — {E[Z@]}? 
= geXatro%! _ ziprat (4.18) 
= zer(er — 1). 


Because of their close relation as expressed in the definition (4.16), many 
results for Brownian motion can be directly translated into analogous re- 
sults for geometric Brownian motion. For example, let us translate the 
* gambler’s ruin probability in Theorem 4.1. For A < 1 and B > 1, define 
Z(t) _ Z(t) 


Z(0) = Aor oy = 3 


T=T,,= min} = 0; 
Theorem 4.3 For a geometric Brownian motion with drift parameter 
a@ and variance parameter o°, and A <1 < B, 


Z(T) _ Ai 2ale? 
Pr| Z(0) = B| — B!- 200? _ A!72alo?* (4.19) 
Example Suppose that the fluctuations in the price of a share of stock 
in a certain company are well described by a geometric Brownian motion 
with drift a = 1/10 and variance o° = 4. A speculator buys a share of this 
stock at a price of $100 and will sell if ever the price rises to $110 
(a profit) or drops to $95 (a loss). What is the probability that the spec- 
ulator sells at a profit? We apply (4.19) with A = 0.95, B = 1.10 and 
1 — 2a/o? = 1 — 2(0.1)/4 = 0.95. Then 

1 — 0.95°” 


Pr{Sell at profit} = 110° — 0,95°95 = 0.3342. 


Example The Black-Scholes Option Pricing Formula  Acall, or war- 
rant, is an option entitling the holder to buy a block of shares in a given 
company at a specified price at any time during a stated interval. Thus the 
call listed in the financial section of the newspaper as 


Hewlett Aug $60 $6 


means that for a price of $6 per share, one may purchase the privilege (op- 
tion) of buying the stock of Hewlett-Packard at a price of $60 per share at 
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any time between now and August (by convention, always the third Friday 
of the month). The $60 figure is called the striking price. Since the most 
recent closing price of Hewlett was $59, the option of choosing when to 
buy, or not to buy at all, carries a premium of $7 = $60 + $6 — $59 over 
a direct purchase of the stock today. 

Should the price of Hewlett rise to, say, $70 between now and the third 
Friday of August, the owner of such an option could exercise it, buying at 
the striking price of $60 and immediately selling at the then current mar- 
ket price of $70 for a $10 profit, less, of course, the $6 cost of the option 
itself. On the other hand, should the price of Hewlett fall, the option 
owner’s loss is limited to his $6 cost of the option. Note that the seller 
(technically called the “writer’’) of the option has a profit limited to the $6 
that he receives for the option but could experience a huge loss should the 
price of Hewlett soar, say to $100. The writer would then either have to 
give up his own Hewlett shares or buy them at $100 on the open market 
in order to fulfill his obligation to sell them to the option holder at $60. 

What should such an option be worth? Is $6 for this privilege a fair 
price? While early researchers had studied these questions using a geo- 
metric Brownian motion model for the price fluctuations of the stock, they 
all assumed that the option should yield a higher mean return than the 
mean return from the stock itself because of the unlimited potential risk to 
the option writer. This assumption of a higher return was shown to be false 
in 1973 when Fisher Black, a financial consultant with a Ph.D. in applied 
mathematics, and Myron Scholes, an assistant professor in finance at MIT, 
published an entirely new and innovative analysis. In an idealized setting 
that included no transaction costs and an ability to borrow or lend limit- 
less amounts of capital at the same fixed interest rate, they showed that an 
owner, or a writer, of a call option could simultaneously buy or sell the un- 
derlying stock (“program trading’’) in such a way as to exactly match the 
returns of the option. Having available two investment opportunities with 
exactly the same return effectively eliminates all risk, or randomness, by 
allowing an investor to buy one while selling the other. The implications 
of their result are many. First, since writing an option potentially carries 
no risk, its return must be the same as that for other riskless investments 
in the economy. Otherwise, limitless profit opportunities bearing no risk 
would arise. Second, since owning an option carries no risk, one should 
not exercise it early, but hold it until its expiration date, when, if the mar- 
ket price exceeds the striking price, it should be exercised, and otherwise 
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not. These two implications then lead to a third, a formula that established 
the worth, or value, of the option. 

The Black-Scholes paper spawned hundreds, if not thousands, of fur- 
ther academic studies. At the same time, their valuation formula quickly 
invaded the financial world, where soon virtually all option trades were 
taking place at or near their Black-Scholes value. It is remarkable that the 
valuation formula was adopted so quickly in the real world in spite of the 
esoteric nature of its derivation and the ideal world of its assumptions. 

In order to present the Black-Scholes formula, we need some notation. 
Let S(t) be the price at time ¢ of a share of the stock under study. We as- 
sume that S(t) is described by a geometric Brownian motion with drift pa- 
rameter a and variance parameter o°. Let F(z, 7) be the value of an option, 
where z is the current price of the stock and 7 is the time remaining until 
expiration. Let a be the striking price. When 7 = 0, and there is no time 
remaining, one exercises the option for a profit of z — a if z > a (market 
price greater than striking price) and does not exercise the option, but lets 
it lapse, or expire, if z = a. This leads to the condition 


F(z, 0) = (z — a)* = max{z — a, 0}. 
The Black-Scholes analysis resulted in the valuation 
F(z, 1) = e-"E[(Z(1) — a)*|Z(0) = 2], (4.20) 


where r is the return rate for secure, or riskless, investments in the econ- 

omy, and where Z(t) is a second geometric Brownian motion having drift 

parameter r and variance parameter ao’. Looking at (4.20), the careful 

reader will wonder whether we have made a mistake. No, the worth of the 

option does not depend on the drift parameter a of the underlying stock. 
In order to put the valuation formula into a useful form, we write 


Z(t) = zer stove E= B(n/V7r, (4.21) 


and observe that 


ze Te tt ONE >a 
is the same as 


log(a/z) — (r — 30°)r 


E>wy= Ws ; (4.22) 
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Then 
e'F(z, 7) = E[(Z(1) — a)*|Z() = 2] 


— E[(ze’7227 +2 VE — a)*] 


= | [zero ove — alg) av, 


v0 
x x 
= zee | e?’" dv) dv — a | d(v) dv. 
v0 v0 


Completing the square in the form 
—h + Viv = Ho - V7) = 0°] 
shows that 
e° * b(v) = et "h(v — oV7), 


whence 


e"F(z, 7) = zeta etse't | d(v — o\V’7) dv — a{l — B(,)] 
v0 
= ze"[1 — D(H — oV7)] — afl — B(,)]. 
Finally, note that 
log(a/z) — (r + 30°)T 
My OV = 
and that 
1 — D(x) = D(-x) and _log(a/z) = —log(z/a) 


—r7 


to get, after multiplying by e~”’, the end result 


log(z/a) + (r + 507) 
oT 


log(zv/a) + (r — em) 
—a oe . 
oT 


F(z, 7) = 0 
(4.23) 
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Equation (4.23) is the Black-Scholes valuation formula. Four of the five 
factors that go into it are easily and objectively evaluated: The current 
market prize z, the striking price a, the time 7 until the option expires, and 
the rate r of return from secure investments such as short-term govern- 
ment securities. It is the fifth factor, 7, sometimes called the volatility, that 
presents problems. It 1s, of course, possible to estimate this parameter 
based on past records of price movements. However, it should be empha- 
sized that it is the volatility in the future that will affect the profitability of 
the option, and when economic conditions are changing, past history may 
“not accurately indicate the future. One way around this difficulty is to 
work backwards and use the Black-Scholes formula to impute a volatility 
from an existing market price of the option. For example, the 
Hewlett-Packard call option that expires in August, six months or, tT = } 
year, in the future, with a current price of Hewlett-Packard stock of $59, 
a striking price of $60, and secure investments returning about 
r = 0.05, a volatility of o = 0.35 is consistent with the listed option price 


Time to 
Striking Expiration Black-Scholes 
Price (Years) Offered Valuation 

a T Price F(z, 7) 
130 1/12 $17.00 $17.45 
130 2/12 19.25 18.87 
135 1/12 13.50 13.09 
135 2/12 15.13 14.92 
140 1/12 8.50 9.26 
140 2/12 12.00 11.46 
145 1/12 5.50 6.14 
145 2/12 9.13 8.52 
145 5/12 13.63 13.51 
150 1/12 3.13 3.80 
150 2/12 6.38 6.14 
155 1/12 1.63 2.18 
155 2/12 4.00 4.28 


155 5/12 9.75 9.05 
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of $6. (When o = 0.35 is used in the Black-Scholes formula, the result- 
ing valuation is $6.03.) A volatility derived in this manner is called an im- 
puted or implied volatility. Someone who believes that the future will be 
more variable might regard the option at $6 as a good buy. Someone who 
believes the future to be less variable than the imputed volatility of 
o = 0.35 might be inclined to offer a Hewlett-Packard option at $6. 

The table on the previous page compares actual offering prices on Feb- 
ruary 26, 1997, for options in IBM stock with their Black-Scholes valua- 
tion using (4.23). The current market price of IBM stock is $146.50, and 
in all cases, the same volatility 0 = 0.30 was used. 

The agreement between the actual option prices and their Black— 
Scholes valuations seems quite good. 


Exercises 


4.1. A Brownian motion {X(t)} has parameters w = —0.1 and o = 2. 
What is the probability that the process is above y = 9 at time t = 4, given 
that it starts at x = 2.82? 


4.2. A Brownian motion {X(t)} has parameters ~ = 0.1 and o = 2. 
Evaluate the probability of exiting the interval (a, b] at the point b start- 
ing from X(0) = 0 for b = 1, 10, and 100 and a = —b. Why do the prob- 
abilities change when a/b is the same in all cases? 


4.3. A Brownian motion {X(t)} has parameters w = 0.1 and o = 2. 
Evaluate the mean time to exit the interval (a, b] from X(0) = 0 for b = 
1, 10, and 100 and a = —b. Can you guess how this mean time varies with 
b for b large? 


4.4. A Brownian motion X(f) either (i) has drift ~ = +36 > 0, or (ii) has 
drift 4. = —36 < 0, and it is desired to determine which is the case by ob- 
serving the process for a fixed duration 7. If X(7) > 0, then the decision 
will be that w = +36; If X(7) = 0, then w = —36 will be stated. What 
should be the length 7 of the observation period if the design error proba- 
bilities are set at a = B = 0.05? Use 6 = 1 and o = 2. Compare this fixed 
duration with the average duration of the sequential decision plan in the 
example of Section 4.1. 
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4.5. Suppose that the fluctuations in the price of a share of stock in a 
certain company are well described by a geometric Brownian motion with 
drift a = —0.1 and variance o* = 4. A speculator buys a share of this 
stock at a price of $100 and will sell if ever the price rises to $110 (a 
profit) or drops to $95 (a loss). What is the probability that the speculator 
sells at a profit? 


4.6. Let €be a standard normal random variable. 
(a) For an arbitrary constant a, show that 
E((é — a)*] = $(@) — a[{l — P(a)]. 


(b) Let X be normally distributed with mean p and variance a’. Show 
that 


sor on1=ofaf2=#) EY - a=] 


Problems 


4.1. What is the probability that a standard Brownian motion {B(t)} 
ever crosses the line a + bt (a > 0,b > 0)? 


4.2. Show that 


b+ Bit 
Pr{ max b+ BY) > at = eo alan bh) a>0O0,b<a. 
20 |+t 


4.3. If B(s),0 <s < 1, is a Brownian bridge process, then 


t 
BHO=(UA+t Be| 
() = (1 + B{—— 
is a standard Brownian motion. Use this representation and the result of 
Problem 4.2 to show that for a Brownian bridge B°(2), 
Pr{max Bu) > a} =e". 
4.4. A Brownian motion X(t) either (i) has drift u = po, or (ii) has drift 
pL = p,, where 4 < ps, are known constants. It is desired to determine 
which is the case by observing the process. Derive a sequential decision 
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procedure that meets prespecified error probabilities a and B. Hint: Base 
your decision on the process X’(t) = X(t) — 3(My + M,). 


4.5. Change a Brownian motion with drift X(t) into an absorbed Brown- 
ian motion with drift X“(t) by defining 


X(t), fort <7, 
0, for t = 7, 


X(t) = 


where 
T= min{t = 0; X(t) = 0}. 


(We suppose that X(0) = x > O and that uz < 0, so that absorption is sure 
to occur eventually.) What is the probability that the absorbed Brownian 
motion ever reaches the height b > x? 


4.6. What is the probability that a geometric Brownian motion with drift 
parameter a = 0 ever rises to more than twice its initial value? (You buy 
stock whose fluctuations are described by a geometric Brownian motion 
with a = 0. What are your chances to double your money?) 


4.7. Acall option is said to be “in the money” if the market price of the 
stock is higher than the striking price. Suppose that the stock follows a 
geometric Brownian motion with drift a, variance oa’, and has a current 
market price of z. What is the probability that the option is in the money 
at the expiration time 7? The striking price is a. 


4.8. Verify the Hewlett-Packard option valuation of $6.03 stated in the 
text when 7 = 3, z = $59, a = 60, r = 0.05 and o = 0.35. What is the 
Black-Scholes valuation if o = 0.30? 


4.9. Let rbe the first time that a standard Brownian motion B(f) starting 
from B(O) = x > 0 reaches zero. Let A be a positive constant. Show that 


w(x) = Ele*"|B(O) = x] = e7Y™. 


Hint: Develop an appropriate differential equation by instituting an 
infinitesimal first step analysis according to 


w(x) = E[E{e~*"|B(An)}|B(O) = x] = Ele™w( + AB)]. 


524 Vill Brownian Motion and Related Processes 


410. Leth, =O0<1t,<t,<... be time points, and define X, = Z(t,) 
exp(—rt,,), where Z(t) is geometric Brownian motion with drift parameters 
r and variance parameter o° (see the geometric Brownian motion in the 
Black-Scholes formula (4.20)). Show that {X,} is a martingale. 


5. The Ornstein—Uhlenbeck Process* 


_ The Ornstein—Uhlenbeck process {V(t); t = 0} has two parameters, a drift 
coefficient B > 0 and a diffusion parameter a’. The process, starting from 
V(O) = v, is defined in terms of a standard Brownian motion {B(t)} by 
scale changes in both space and time: 


oe” 
V(t) = ve & + —— B(e* — 1), fort = 0. 5.1 
(t) 3p ( ) (5.1) 


The first term on the right of (5.1) describes an exponentially decreasing 
trend towards the origin. The second term represents the fluctuations 
about this trend in terms of a rescaled Brownian motion. The Orn- 
stein—Uhlenbeck process is another example of a continuous-state-space, 
continuous-time Markov process having continuous paths, inheriting 
these properties from the Brownian motion in the representation (5.1). It 
is a Gaussian process (see the discussion in Section 1.4), and (5.1) easily 
shows its mean and variance to be 


E[V(t)|V(O) = v] = ve-*, (5.2) 


and 


Var[V(1)|V(O) = x] = on Var[B(e" — 1)] 


0S) 


Knowledge of the mean and variance of a normally distributed random 
variable allows its cumulative distribution function to be written in terms 
of the standard normal distribution (1.6), and by this means we can im- 


(5.3) 


* This section contains material of a more specialized nature. 
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mediately express the transition distribution for the Ornstein—Uhlenbeck 
process as 
V2B(Ly —- =<) 


Pr{V(t) = y|V(O) = x} = 0 oV/1 — ene 


(5.4) 


The Covariance Function 


Suppose that 0 < u < s, and that V(Q) = x. Upon subtracting the mean as 
given by (5.2), we obtain 


Cov[V(u), Vis)] = EL{V(u) — xe“ }{V(s) — xe} 


Oa 2Bu 2ps 
= 35° Bus) ET {B(e?8 — 1)} {Be — 1)}] 


2 (5.5) 
=_ aaente™ —_ 1) 


o 
_— (e7 Bs—) —_ e7 Astw) 


5.1. A Second Approach to Physical Brownian Motion 


The path that we have taken to introduce the Ornstein—Uhlenbeck process 
is not faithful to the way in which the process came about. To begin an ex- 
planation, let us recognize that all models of physical phenomena have de- 
ficiencies, and the Brownian motion stochastic process as a model for the 
Brownian motion of a particle is no exception. If B(t) is the position of a 
pollen grain at time ¢ and if this position is changing over time, then the 
pollen grain must have a velocity. Velocity is the infinitesimal change in 
position over infinitesimal time, and where B(t) is the position of the 
pollen grain at time t, the velocity of the grain would be the derivative 
dB(t)/dt. But while the paths of the Brownian motion stochastic process 
are continuous, they are not differentiable. This remarkable statement is 
difficult to comprehend. Indeed, many elementary calculus explanations 
implicitly tend to assume that all continuous functions are differentiable, 
and if we were to be asked to find an example of one that wasn’t, we might 
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consider it quite a challenge. Yet each path of a continuous Brownian mo- 
tion stochastic process is (with probability one) differentiable at no point. 
We have encountered yet another intriguing facet of stochastic processes 
that we cannot treat in full detail but must leave for future study. We will 
attempt some motivation, however. Recall that the variance of the Brown- 
ian increment AB is At. But variations in the normal distribution are not 
scaled in terms of the variance, but in terms of its square root, the standard 
deviation, so that the Brownian increment AB is roughly on the order of 
V At, and the approximate derivative 


AB AB I 
Ar ~WVAt VAt 


is roughly on the order of 1/\/At. This, of course, becomes infinite as 
At — 0, which suggests that a derivative of Brownian motion, were it to 
exist, could only take the values +. As a consequence, the Brownian 
path cannot have a derivative. The reader can see from our attempt at ex- 
planation that the topic is well beyond the scope of an introductory text. 

Although its movements may be erratic, a pollen grain, being a physi- 
cal object of positive mass, must have a velocity, and the Ornstein-— 
Uhlenbeck process arose as an attempt to model this velocity directly. 
Two factors are postulated to affect the particle’s velocity over a small 
time interval. First, the frictional resistance or viscosity of the surround- 
ing medium is assumed to reduce the magnitude of the velocity by a 
deterministic proportional amount, the constant of proportionality being 
B > 0. Second, there are random changes in velocity caused by collisions 
with neighboring molecules, the magnitude of these random changes 
being measured by a variance coefficient a’. That is, if V(t) is the veloc- 
ity at time ¢, and AV is the change in velocity over (t, t + At], we might 
express the viscosity factor as 


E[AV|V(t) = v] = —BvAt + o(An) (5.6) 
and the random factor by 
Var[AV|V(t) = v] = o7At + o(Ad). (5.7) 


The Ornstein—Uhlenbeck process was developed by taking (5.6) and (5.7) 
together with the Markov property as the postulates, and from them de- 
riving the transition probabilities (5.4). While we have chosen not to fol- 
low this path, we will verify that the mean and variance given in (5.2) and 
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(5.3) do satisfy (5.6) and (5.7) over small time increments. Beginning with 
(5.2) and the Markov property, the first step is 


E[Vit + Ad|V(t) = v] = ve#* = v[1 — BAt + o(AD], 
and then 
E[AV|V(t) = v] = E[Vt + Ad|V() = v] - v 
= —BvAt + o(Ad, 


and over small time intervals the mean change in velocity is the propor- 
tional decrease desired in (5.6). For the variance, we have 


_  Var[AV|V(t) = v] = Var[Vit + Ad|V(t) = v] 
| ; 1 — ee 7BAt 
ed ( 2B 


o’At + o(Ad), 


and the variance of the velocity increment behaves as desired in (5.7). In 
fact, (5.6) and (5.7) together with the Markov property can be taken as the 
definition of the Ornstein—Uhlenbeck process in much the same way, but 
involving far deeper analysis, that the infinitesimal postulates of V, Sec- 
tion 2.1 serve to define the Poisson process. 


Example Tracking Error Let V(t) be the measurement error of a radar 
system that is attempting to track a randomly moving target. We assume 
V(t) to be an Ornstein—Uhlenbeck process. The mean increment E[A V V(t) 
= v] = — BvAt + o(Ad represents the controller’s effort to reduce the cur- 
rent error, while the variance term reflects the unpredictable motion of the 
target. If 8 = 0.1, o = 2, and the system starts on target (v = Q), the prob- 
ability that the error is less than one at time ¢ = 1 1s, using (5.4), 


with) eg 


Pr{iV(O) <1} = 0 1 — e-2bt V1 — e728 


= (es) - es) 


= 0(0.53) — &(—0.53) = 0.4038. 
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As time passes, this near-target probability drops to ®(1/\/20) — 
®(— 1/V'20) = ®(0.22) — &(-0.22) = 0.1742. 


Example Genetic Fluctuations under Mutation In VI, Section 4, we 
introduced a model describing fluctuations in gene frequency in a popula- 
tion of N individuals, each either of gene type a or gene type A. With X(1) 
being the number of type a individuals at time t, we reasoned that X(t) 
would be a birth and death process with parameters 


nay. ALE =n) +2 


~ ntl d J 

by = wala +(1-x)a- 9} 

The parameters yy, and y, measured the rate of mutation from a-type to A- 

type, and A-type to a-type, respectively. Here we attempt a simplified de- 

scription of the model when the population size N is large. The steady- 

state fluctuations in the relative gene frequency X(t)/N are centered on the 
mean 


and 


—_ ¥2 
Y + 2 


Accordingly, we define the rescaled and centered process 
X(t 
V(t) = vn =? - n) 


With | 
AV = V(t + At) — V,Cd), and AX = X(t + At) — X(d, 
we have 
E[AX|X(t) = j] = (A, — m)At + o( An, 
which becomes, after substitution and simplification, 


E(AX|X(t) =j|= wal (1 _ iy, — Jy, [as + o(At). 
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More tedious calculations show that 


E[AX’|X(t) = j] = nal Te + o( at 


Our next step is to rescale these in terms of v, using 


J rn v 
a ene 
N VN 
In the rescaled variables, 
7 ] X(t) J v | 
E[AVVWO = —aA| y NAT t UR 


= avni(i — 7 aly — (r + Sa) nai + o(At) 


= —X(y, + y,)vAt + o(Ad). 


A similar substitution shows that 


2AN72 
(y, + 2) 


Similar computations show that the higher moments of AV are negligible. 
Since the relations (5.6) and (5.7) serve to characterize the Ornstein— 
Uhlenbeck process, the evidence is compelling that the rescaled 
gene processes {V,(t)} will converge in some appropriate sense to an 
Ornstein—Uhlenbeck process V(t) with 


E[AVAV(t) = v) = At + o(Ad). 


2ANY> 


=Ny, + y,) and c= =, 
P ° (y, + Y2)° 


and 
X(t) = Na+ VNV(t) for large N. 


This is indeed the case, but it represents another topic that we must leave 
for future study. 
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5.2. The Position Process 


If V(t) is the velocity of the pollen grain, then its position at time t would be 
t 


S(t) = 50) + | Vu) du. (5.8) 
0) 


Because the Ornstein—Uhlenbeck process is continuous, the integral in 
(5.8) is well-defined. The Ornstein—Uhlenbeck process is normally dis- 
tributed, and so is each approximating sum to the integral in (5.8). It must 
be, then, that the position process S(t) is normally distributed, and to de- 
scribe it, we need only evaluate its mean and covariance functions. To 
simplify the mathematics without losing any essentials, let us assume that 
S(O) = V(O) = 0. Then 


t t 
E{S()] = E| [ ws) as| = | ELV(s)] ds = 0. 
0 0 
(The interchange of integral and expectation needs justification. Since the 
expected value of a sum is always the sum of the expected values, the in- 
terchange of expectation with Riemann approximating sums is clearly 


valid. What is needed is justification in the limit as the approximating 
sums converge to the integrals.) 


Var[S(t)] = E[S(t?] = é|| | Vis) ast | 
O 


= off v0 afro a 


E[V(s)V(u)] du ds 


E[V(s)V(u)] du ds ~ (5,9) 


| 
0 
tes 
~ J J (e7Fs-) — e Bs*) dy ds (Using (5.5)) 
00 
t 
J 
0 


o 

B 
= 7 e P(e’ —]1 — 1+ e*) ds 

o° 2 - Bt i — phi 
= mit gi —e Py + 2p e “| 
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This variance behaves like that of a Brownian motion when t is large in 
the sense that 


Var[S(t)] a? 
t mm & 


That is, observed over a long time span, the particle’s position as modeled 
by an Ornstein—Uhlenbeck velocity behaves much like a Brownian mo- 
tion with variance parameter o7/B. In this sense, the Ornstein—Uhlenbeck 
model agrees with the Brownian motion model over long time spans and 
improves upon it for short durations. Section 5.4 offers another approach. 


as [7 %&,. 


Example - Stock Prices It is sometimes assumed that the market price 
of a share of stock follows the position process under an Ornstein— 
Uhlenbeck velocity. The model is consistent with the Brownian motion 
model over long time spans. In the short term, the price changes are not 
independent but have an exponentially decreasing correlation meant to 
capture some notion of a market momentum. Suppose a call option is to 
be exercised, if profitable at a striking price of a, at some fixed time ¢ in 
the future. If V(0) = 0 and S(O) = zis the current stock price, then the ex- 
pected value of the option is 


sian ar = oo #)- ESA aff =A]. 


where 
M=%, 
and 


2 


P= 


o 2 — Bt | — pop 
mit ~ 5 —e *) + 55K e~*Br)), 


Note that yz and 7° are the mean and variance of S(t). The derivation is left 
for Problem 5.4. 


5.3. The Long Run Behavior 


It is easily seen from (5.2) and (5.3) that for large values of t, the mean of 
the Ornstein-Uhlenbeck process converges to zero and the variance to 
o?/2B. This leads to a limiting distribution for the process in which 
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lim Pr{ V(t) < y\V(O) = x} = wo(V26Y) (5.11) 
That is, the limiting distribution of the process is normal with mean zero 
and variance o7/(28). We now set forth a representation of a stationary 
Ornstein—Uhlenbeck process, a process for which the limiting distribution 
in (5.11) holds for all finite times as well as in the limit. The stationary 
Omnstein—Uhlenbeck process {V*(t); —0% < t < °%} is represented in terms 
of a Brownian motion by 


OC 
V(t) = - Bt B(e28), —x1<t<o, 5.12 
(t) VIB. (e**') (5.12) 


The stationary Ornstein—Uhlenbeck process is Gaussian (see Section 1.4) 
and has mean zero. The covariance calculation is 


I(s, ) = Cov[V*(s), V°(D] 


3 


ae Cov[B(e*"), B(e**)] 


2 (5.13) 


o . . 
Bb min{e?**, Ee?) 


? 


= 35 


The stationary Ornstein-Uhlenbeck process is the unique Gaussian 
process with mean zero and covariance (5.13). 

The independence of the Brownian increments implies that the station- 
ary Ornstein—Uhlenbeck process is a Markov process, and it is straight- 
forward to verify that the transition probabilities are given by (5.4). 


e Bis. 


Example An Ehrenfest Urn Model in Continuous Time A single par- 
ticle switches repeatedly between urn A and um B. Suppose that the du- 
ration it spends in an urn before moving is an exponentially distributed 
random variable with parameter , and that all durations are independent. 
Let &t) = 1 if the particle is in urn A at time ¢, and &t) = —1 if in urn B. 
Then {&2); t = 0} is a two-state Markov process in continuous time for 
which (See III, (3.14)) 
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Pride + 5) = 1g) = 1) => + Se (5.14) 


Let us further stipulate that the particle is equally likely to be in either urn 
at time zero. It follows, then, that it is equally likely to be in either urn at 
all times, and that therefore, E[&(t)] = O for all t. Using (5.14) and the 
symmetry of the process, we may derive the covariance. We have 


l 
Ela HEt + 5)] = > Pree + 8) = EH = 1} 
+ = Prlé + s)= —1/é(t) = —]} 


1 
-5 Pr{é(t + s) = —1&() = 1} (5.15) 


1 
— 5 Prté(e + s) = El = -1) 


= e° 2Bs, 


Now consider N of these particles, each alternating between the urns in- 
dependently of the others, and let &(t) track the position of the ith particle 
at time t. The disparity between the numbers of particles in the two urns 
is measured by 


N 
Sy) = > E(2). 
i=1 


If S,(t) = 0, then the urns contain equal numbers of particles. If S,(4) = k, 
then there are (N + k)/2 particles in urn A. The central limit principle for 
random functions suggests that 


] 
V(t) = Vy oe) 


should, for large N, behave similarly to a Gaussian process with mean 
zero and covariance I'(s, t) = exp{—2t — s|}. This limiting process is a 
stationary Ornstein—Uhlenbeck process with o? = 2. Thus we have de- 
rived the approximation 
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for the behavior of this continuous-time urn model when the number of 
particles is large. 


5.4. Brownian Measure and Integration* 


We state, in the form of a theorem without proof, an exceedingly useful 
formula for computing certain functionals of Gaussian processes. This 
theorem provides a tiny glimpse into a vast and rich area of stochastic 
process theory, and we included it in this elementary text in a blatant at- 
tempt to entice the student towards further study. 


Theorem 5.1 = Let g(x) be a continuous function and let {B(t); t = 0} be 


a standard Brownian motion. For each fixed value of t > 0, there exists a 
random variable 


$(¢) = | g(x) dB) (5.16) 
0 


that is the limit of the approximating sums 
2. kN Ok k-1 
si = 5 o(Enfalts) —a(@S4)] an 
(8 ) > § Qn Qn 2" ( ) 


as n — ©, The random variable $(g) is normally distributed with mean 
zero and variance 


Var[S(g)] = | g’(u) du. (5.18) 
‘ . 


If f(x) is another continuous function of x, then $(f) and £(g) have a joint 
normal distribution with covariance 


ELS(A)I(G)I = | FDe(x) de. (5.19) 
0 


* This subsection is both more advanced and more abstract than those that have pre- 
ceeded it. 
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The proof of the theorem is more tedious than difficult, but it does require 
knowledge of facts that are not included among our prerequisites. The the- 
orem asserts that a sequence of random variables, the Riemann approxi- 
mations, converge to another random variable. The usual proof begins 
with showing that the expected mean square of the difference between 
distinct Riemann approximations satisfies the Cauchy criterion for con- 
vergence, and then goes on from there. 

When g(x) is differentiable, then an integration by parts may be vali- 
dated, which shows that 


[ 9c) dB = e(OBW) — | Bode") ax, (5.20) 
0 0 


and this approach may yield a concrete representation of the integral in 
certain circumstances. For example, if g(x) = 1, then g’(x) = 0, and 


t 
[ 1 aB(x) = gBO - 0 = BO, 
0 
as one would hope. When g(x) = t — x, then g’(x) = —1, and 
t t 
| (t — x) dB(x) = | B(x) dx. (5.21) 
0 0 
The process on the right side of (5.21) is called integrated Brownian mo- 
tion. Theorem 5.1 then asserts that integrated Brownian motion is nor- 
mally distributed with mean zero and variance 


var f Bo dx] = [oe — ay ax =. 
0 0 


The calculus of the Brownian integral of Theorem 5.1 offers a fresh and 
convenient way to determine functionals of some Gaussian processes. For 
example, in the case of the Ornstein—Uhlenbeck process, we have the in- 
tegral representation 


Vit) = ve" +o | eB) qB(u). (5.22) 
0 


The second term on the right of (5.22) expresses the random component 
of the Ornstein—Uhlenbeck process as an exponentially weighted moving 
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average of infinitesimal Brownian increments. According to Theorem 5.1, 
this random component has mean zero and is normally distributed. We use 
Theorem 5.1 to determine the covariance. For 0 < 5 < t, 


Cov[Vi(s), Vit)] = E[{V(s) — ve®}{V(t) — ve7*}] 


t 
o°E| [ e ®s-) qB(u) | e Biimw) aB(w)| 
( 0 


0 


t 
Co | (Cu < sje~ Fs Me“ BI) dy 
0 
S 
= ge Ast | e7h« dy 
0 


= ae ee —_ 1) 


9 


2 


(e7 B-) — e7 Arts) 


in agreement with (5.5). 


Example The Position Process Revisited Let us assume that V(O) = 
v = 0. The integral of the Ornstein—Uhlenbeck velocity process gives the 
particle’s position S(t) at time ¢. If we replace the integrand by its repre- 
sentation (5.22) (v = 0), we obtain 


S(t) = f Vis)ds =a f { e ®s-") AB(u) ds 
0 00 


=o f f es-) ds dB(u) 
0 


u 


t 


o | ePu | e* ds dB(u) = (5.23) 
0 


ul 


= 3 eFu(e- Pu — e-P') dB(u) 


Co —pUl—au 
= ral — eB) dB(u). 
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Theorem 5.1 applied to (5.23) tells us that the position S(t) at time ¢ is nor- 
mally distributed with mean zero and variance 


Var[S(t)] = { 1 — eBay? dy 
0 


t 
= ake — e ®)? dw 
0 


_o _2 — pf ,t — “| 

alt Bt e-P) 2p" e7*Pr) |, 

in agreement with (5.9). Problem 5.4 calls for using Theorem 5.1 to de- 
termine the covariance between the velocity V(t) and position S(t). 

The position process under an Ornstein—Uhlenbeck velocity behaves 
like a Brownian motion over large time spans, and we can see this more 
clearly from the Brownian integral representation in (5.23). If we carry 
out the first term in the integral (5.23) and recognize that the second part 
is V(t) itself, we see that 


S(t) = 


o ~ {-t 
ak ~ eA) dB(u) 


o —PAlt—u 
— Al B(t) - e A  dBu) (5.24) 


= 7loBe) — V(0)]. 


Let us introduce a rescaled position process that will allow us to better see 
changes in position over large time spans. Accordingly, for N > 0, let 


Sy(t) = SIND 


1 
2 
= alm * via) O29) 


_ ria + VN a0) 
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whereB(t) = B(Nt)/\N remains a standard Brownian motion. (See Exer- 
cise 1.2.) Because the variance of V(t) is always less than or equal to 
o’/(2B), the variance of Vit\/\V/N becomes neglible for large N. Equation 
(5.25) then shows more clearly in what manner the position process be- 
comes like a Brownian motion: For large N, 


O ~ 
Sy(t) = —B(t). 
(t) B (t) 


Exercises 


5.1. An Ornstein—Uhlenbeck process V(t) has 0? = 1 and B = 0.2. What 
is the probability that V(t) = 1 for t = 1, 10, and 100? Assume that 
V(O) = 0. 


5.2. The velocity of a certain particle follows an Ornstein—Uhlenbeck 
process with o* = 1 and B = 0.2. The particle starts at rest (v = 0) from 
position $(0) = 0. What is the probability that it is more than one unit 
away from its origin at time ¢ = 1. What is the probability at times t = 10 
and t = 100? 


5.3. Let &, &, ... be independent standard normal random variables 
and £ a constant, 0 < B < 1. A discrete analogue to the Ornstein— 
Uhlenbeck process may be constructed by setting 


VY=v and V.=(1-£8)V,_,+ € for n21. 


(a) Determine the mean value function and covariance function for 
{V,,}. 

(b) Let AV = V,,, — V,. Determine the conditional mean and variance 
of AV, given that V, = v. 


Problems 


5.1. Let &, &, ... be independent standard normal random variables 
and B a constant, 0 < B < 1. A discrete analogue to the Ornstein— 
Uhlenbeck process may be constructed by setting 


Y=v and V,=(1—-P)V.+ 6 for n2l. 
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(a) Show that 
V,=(1- Byv +> 1 - pyr'é,. 
k=] 


Comment on the comparison with (5.22). 
(b) Let AV, = V, — Vi S, = v +t Vi +t ces + Vz, and B, = 
€&, +--+ + &. Show that 


V,, =Vv— BS,-\ + B,,. 
Compare and contrast with (5.24). 


5.2. Let -S(t) be the position process corresponding to an Ornstein— 
Uhlenbeck velocity V(t). Assume that S$(0) = V(O) = 0. Obtain the co- 
variance between S(t) and V(f). 


5.3. Verify the option valuation formulation (5.10). 


Hint: Use the result of Exercise 4.6. 


5.4. Inthe Ehrenfest urn model (see III, Section 3.2) for molecular dif- 
fusion through a membrane, if there are i particles in urn A, the probabil- 
ity that there will be 7 + 1 after one time unit is 1 — i/(2N), and the prob- 
ability of i — 1 is i/(2N), where 2N is the aggregate number of particles in 
both urns. Following III, Section 3.2, let ¥, be the number of particles in 
urn A after the nth transition, and let X, = Y, — N. Let AX = X,,, — X,, be 
the change in urn composition. The probability law is 


Pr{AX = +11X, =x} = 


We anticipate a limiting process in which the time between transitions be- 
comes small and the number of particles becomes large. Accordingly, let 
At = 1/N and measure fluctuations of a rescaled process in units of order 
1/V/N. The definition of the rescaled process is 


Xin 
VN 


Note that in the duration t = 0 to t = 1 in the rescaled process, there 
are N transitions in the urns, and a unit change in the rescaled process 


V(t) = 
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corresponds to a fluctuation of order \/N in the um composition. Let 
AV = V,(t + 1/N) — V(t) be the displacement in the rescaled process 
over the time interval At = 1/N. Show that 


Btavia = 1 = a (5 - 32m) - sel + a) 
oe 


and that (AV)* = 1/N, whence 
Var[AV|V,(t) = v] = E[(AV)’] — {E[AV]}? 


l l 
= — + o(—] = At + o(Ad). 
N oly) Ar + oft) 


Chapter IX 
Queueing Systems 


1. Queueing Processes 


A queueing system consists of “customers” arriving at random times to 
some facility where they receive service of some kind and then depart. We 
use “customer” as a generic term. It may refer, for example, to bona fide 
customers demanding service at a counter, to ships entering a port, to 
batches of data flowing into a computer subsystem, to broken machines 
awaiting repair, and so on. Queueing systems are classified according to 


1. The input process, the probability distribution of the pattern of ar- 
rivals of customers in time; 

2. The service distribution, the probability distribution of the random 
time to serve a customer (or group of customers in the case of batch 
service); and 

3. The queue discipline, the number of servers and the order of cus- 
tomer service. 


While a variety of input processes may arise in practice, two simple and 
frequently occurring types are mathematically tractable and give insights 
into more complex cases. First is the scheduled input, where customers ar- 
rive at fixed times T, 2T, 37, .... The second most common model is the 
“completely random” arrival process, where the times of customer ar- 
rivals form a Poisson process. Understanding the axiomatic development 
of the Poisson process in V may help one to evaluate the validity of the 
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Poisson assumption in any given application. Many theoretical results are 
available when the times of customer arrivals form a renewal process. Ex- 
ponentially distributed interarrival times then correspond to a Poisson 
process of arrivals as a special case. 

We will always assume that the durations of service for individual cus- 
tomers are independent and identically distributed nonnegative random 
variables and are independent of the arrival process. The situation in which 
all service times are the same fixed duration D, is, then, a special case. 

The most common queue discipline is first come, first served, where 
customers are served in the same order in which they arrive. All of the 
models that we consider in this chapter are of this type. 

Queueing models aid the design process by predicting system perfor- 
mance. For example, a queueing model might be used to evaluate the 
costs and benefits of adding a server to an existing system. The models en- 
able us to calculate system performance measures in terms of more basic 
quantities. Some important measures of system behavior are 


1. The probability distribution of the number of customers in the sys- 
tem. Not only do customers in the system often incur costs, but in 
many systems, physical space for waiting customers must be 
planned for and provided. Large numbers of waiting customers can 
also adversely affect the input process by turning potential new cus- 
tomers away. (See Section 4.1 on queueing with balking.) 

2. The utilization of the server(s). Idle servers may incur costs without 
contributing to system performance. 

3. System throughput. The long run number of customers passing 
through the system is a direct measure of system performance. 

4. Customer waiting time. Long waits for service are annoying in the 
simplest queueing situations and directly associated with major costs 
in many large systems such as those describing ships waiting to 
unload at a port facility or patients awaiting emergency care at a 
hospital. 


1.1. The Queueing Formula L = AW 


Consider a queueing system that has been operating sufficiently long to 
have reached an appropriate steady state, or a position of statistical equi- 
librium. Let 
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L = the average number of customers in the system; 
A = the rate of arrival of customers to the system; and 
W = the average time spent by a customer in the system. 


The equation L = AWis valid under great generality for such systems and 
is of basic importance in the theory of queues, since it directly relates two 
of our most important measures of system performance, the mean queue 
size and the mean customer waiting time in the steady state, that is, mean 
queue size and mean customer waiting time evaluated with respect to a 
limiting or stationary distribution for the process. 

The validity of L = AW does not rest on the details of any particular 
model, but depends only upon long run mass flow balance relations. To 
sketch this reasoning, consider a time T sufficiently long so that statistical 
fluctuations have averaged out. Then the total number of customers to 
have entered the system is AT, the total number to have departed is A(T — 
W), and the net number remaining in the system L must be the difference 


L= XT — (A(T — W)] = AW. 


Figure 1.1 depicts the relation L = AW. 

Of course, what we have done is by no means a proof, and indeed, we 
shall give no proof. We shall, however, provide several sample verifi- 
cations of L = AW where L 1s the mean of the stationary distribution of 


r 
= 
~ 
S 
iB) 
2 
3 
3 
= 
—] 
O 
time 
(a) Random arrivals, departures (b) Smoothed values 


Figure 1.1. The cumulative number of arrivals and departures in a queueing 
system. The smoothed values in (b) are meant to symbolize long 
run averages. The rate of arrivals per unit time is A, the mean 
number in the system is L, and the mean time a customer spends 
in the system is W. 
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customers in the system, W is the mean customer time in the system de- 
termined from the stationary distribution, and A is the arrival rate in a 
Poisson arrival process. 

Let L, be the average number of customers waiting in the system who 
are not yet being served, and let W, be the average waiting time in the sys- 
tem excluding service time. In parallel to L = AW, we have the formula 


Ly = AW (1.1) 


The total waiting time in the system is the sum of the waiting time before 
service and the service time. In terms of means, we have 


W = W, + mean service time. (1.2) 


1.2. A Sampling of Queueing Models 


In the remainder of this chapter we will study a variety of queueing sys- 
tems. A standard shorthand is used in much of the queueing literature for 
identifying simple queueing models. The shorthand assumes that the ar- 
rival times form a renewal process, and the format A/B/c uses A to de- 
scribe the interarrival distribution, B to specify the individual customer 
service time distribution, and c to indicate the number of servers. The 
common cases for the first two positions are G = G/ for a general or ar- 
bitrary distribution, M (memoryless) for the exponential distribution, E£, 
(Erlang) for the gamma distribution of order k, and D for a deterministic 
distribution, a schedule of arrivals or fixed service times. 
Some examples discussed in the sequel are the following: 


The M/M/1 queue Arrivals follow a Poisson process; service times 
are exponentially distributed; and there is a single server. The number 
X(t) of customers in the system at time ¢ forms a birth and death 
process. (See Section 2.) 


The M/M/~ queue _ There are Poisson arrivals and exponentially dis- 
tributed service times. Any number of customers are processed simul- 
taneously and independently. Often self-service situations may be de- 
scribed by this model. In the older literature this was called the 
“telephone trunking problem.” 
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The MI/G/1 queue In this model there are Poisson arrivals but arbi- 
trarily distributed service times. The analysis proceeds with the help of 
an embedded Markov chain. 


More elaborate variations will also be set forth. Balking is the refusal 
of new customers to enter the system if the waiting line is too long. More 
generally, in a queueing system with balking, an arriving customer enters 
the system with a probability that depends on the size of the queue. Here 
it is important to distinguish between the arrival process and the input 
process, as shown in Figure 1.2. A special case is a queue with overflow, 
in which an arriving customer enters the queue if and only if there is at 
least one server free to begin service immediately. 

In a priority queue, customers are allowed to be of different types. Both 
the service discipline and the service time distribution may vary with the 
customer type. 

A queueing network is a collection of service facilities where the de- 
partures from some stations form the arrivals of others. The network is 
closed if the total number of customers is fixed, these customers continu- 
ously circulating through the system. The machine repair model (see the 
example entitled “Repairman Models” in VI, Section 4) is an example of 
a closed queueing network. In an open queueing network, customers may 
arrive from, and depart to, places outside the network, as well as move 
from station to station. Queueing network models have found much recent 
application in the design of complex information processing systems. 


Departs 
system 
with 
probability Waiting 
1 — Pp, line 


XXXX Service 
——_—sEnterssystem = ————~> customers facility 
Arrival with probability Input waiting 


process Pn process 


Figure 1.2 If customers are waiting in a queueing system with balking, an 
arriving customer enters the system with probability p, and does 
not enter with probability 1 — p,. 
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Exercises 


1.1. What design questions might be answered by modeling the follow- 
ing queueing systems? 


The Customer The Server 
(a) Arriving airplanes The runway 
(b) Cars A parking lot 
(c) Broken TVs Repairman 
(d) Patients Doctor 
(e) Fires Fire engine company 


What might be reasonable assumptions concerning the arrival process, 
service distribution, and priority in these instances? 


1.2. Consider a system, such as a barber shop, where the service re- 
quired is essentially identical for each customer. Then actual service times 
would tend to cluster near the mean service time. Argue that the expo- 
nential distribution would not be appropriate in this case. For what types 
of service situations might the exponential distribution be quite plausible? 


1.3. Oil tankers arrive at an offloading facility according to a Poisson 
process whose rate is A = 2 ships per day. Daily records show that there 
is an average of 3 ships unloading or waiting to unload at any instant in 
time. On average, what is the duration of time that a ship spends in port? 
Assume that a ship departs immediately after unloading. 


Problems 


1.1. Two dump trucks cycle between a gravel loader and a gravel un- 
loader. Suppose that the travel times are insignificant relative to the load 
and unload times, which are exponentially distributed with parameters 
and A, respectively. Model the system as a closed queueing network. De- 
termine the long run gravel loads moved per unit time. 


Hint: Refer to the example entitled “Repairman Models” in Section 
4 of VI. 
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2. Poisson Arrivals, Exponential Service Times 


The simplest and most extensively studied queueing models are those 
having a Poisson arrival process and exponentially distributed service 
times. In this case the queue size forms a birth and death process (see Sec- 
tions 3 and 4 of VI), and the corresponding stationary distribution is read- 
ily found. 

We let A denote the intensity, or rate, of the Poisson arrival process and 
assume that the service time distribution is exponential with parameter yp. 
The corresponding density function is 


g(x) = pe * for x > 0. (2.1) 
For the Poisson atrival process we have 
Pr{An arrival in [t, t + h)} = Ah + oth) (2.2) 
and 
Pr{No arrivals in [t, tf + h)} = 1 — Ah + o(h). (2.3) 


Similarly, the memoryless property of the exponential distribution as ex- 
pressed by its constant hazard rate (see I, Section 4.2) implies that 


Pr{A service is completed in [t, t + h)|Service in progress at time ¢} 
= wh + o(h), (2.4) 
and 
Pr{Service not completed in [t, t + h)|Service in progress at time ¢} 
= 1—-— ph + oth). (2.5) 


The service rate xz applies to a particular server. If k servers are simulta- 
neously operating, the probability that one of them completes service in a 
time interval of duration h is (ku)h + o(h), so that the system service rate 
is ku. The principle used here is the same as that used in deriving the in- 
finitesimal parameters of the Yule process (VI, Section 1). 

We let X(t) denote the number of customers in the system at time f, 
counting the customers undergoing service as well as those awaiting ser- 
vice. The independence of arrivals in disjoint time intervals together with 
the memoryless property of the exponential service time distribution 
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implies that X(t) is a time homogeneous Markov chain, in particular, a 
birth and death process. (See Sections 3 and 4 of VI.) 


2.1. The M/M/1 System 


We consider first the case of a single server and let X(t) denote the num- 
ber of customers in the system at time ¢. An increase in X(t) by one unit 
corresponds to a customer arrival, and in view of (2.2) and (2.5) and the 
postulated independence of service times and the arrival process, we have 


Pr{X(t + h) = k + 1|X(t) = k} = [AA + o(h)] X [1 — ph + ofh)] 
=Ah+o(h)  fork=0,1,.... 


Similarly, a decrease in X(t) by one unit corresponds to a completion of 
service, whence 


Pr{X(t + h) =k — 1|X() =k} = ph+o(h) fork =1,2,.... 
Then X(¢) is a birth and death process with birth parameters 
A, =A for k = 0, 1, 2,... 
and death parameters 
by, = ph fork = 1,2,.... 


Of course, no completion of service is possible when the queue is empty. 
We thus specify py = 0. 
Let 


a, = lim Pr{X(t) = k} fork = 0,1,... 


be the limiting, or equilibrium, distribution of queue length. Section 4 of 
VI describes a straightforward procedure for determining the limiting dis- 
tribution 7, from the birth and death parameters A, and p1,. The technique 
is to first obtain intermediate quantities 0, defined by 


&=1 and 6= ———_——— forj = 1, (2.6) 
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and then 


Th = = and 7, = RL = = fork = 1. (2.7) 
j=0 97 j=0 9) 
When 3;-. 6, = ©, then lim... Pr{X(t) = k} = 0 for all k, and the queue 
length grows unboundedly in time. 
For the M/M/1 queue at hand we readily compute @ = 1 and 6, = (A/py 
forj = 1,2,....Then 


= 00 ifA= p. 


Thus, no equilibrium distribution exists when the arrival rate A is equal to 
or greater than the service rate yw. In this case the queue length grows with- 
out bound. 

When A < yp, a bona fide limiting distribution exists, given by 


I A 
=——=1-5 2.8 
and 
A\/A\ 
Th, = 70, = (1 - ==) for k = 0, l,.... (2.9) 
pe /)\ 


The equilibrium distribution (2.9) gives us the answer to many questions 
involving the limiting behavior of the system. We recognize the form of 
(2.9) as that of a geometric distribution, and then reference to I, Section 
3.3 gives us the mean queue length in equilibrium to be 

A 


L= ; 2.10 
uA (2.10) 


The ratio p = A/wis called the traffic intensity, 


_ arrival rate _ A (2.11) 


system service rate 
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As the traffic intensity approaches one, the mean queue length 
L = pi(1 — p) becomes infinite. Again using (2.8), the probability of being 
served immediately upon arrival is 

™m=1- A 

Mv 

the probability, in the long run, of finding the server idle. The server 
utilization, or long run fraction of time that the server is busy, is 
1 — 1% = Alp. 

We can also calculate the distribution of waiting time in the stationary 
case when A < uw. If an arriving customer finds n people in front of him, 
his total waiting time T, including his own service time, is the sum of the 
service times of himself and those ahead, all distributed exponentially 
with parameter ps, and since the service times are independent of the 
queue size, T has a gamma distribution of order n + 1 with scale parame- 
ter pL, 

n+l png-Bt 


je 
Pr{T = tin ahead ————. dt. 2.12 
r{T = t\n ahead} = ear T (2.12) 


By the law of total probability, we have 
A A 
Pr{T = t} = > Pri{Ts tin ahead} X " } (1 — =) 
=0 ad 


since (A/)"(1 — A/p) is the probability that in the stationary case a cus- 
tomer on arrival will find n ahead in line. Now, substituting from (2.12), 


we obtain 
priate LT a *) 
~_—__(|-](1-—]d 
Tin + 1) ul 


7 
4 
IA 
ll 
1 1M 8 
ou, 


I( — a seo — “) dt = 1 — exp[—t(u — A)], 


which is also an exponential distribution. 
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The mean of this exponential waiting time distribution is the reciprocal 
of the exponential parameter, or 


l 
W aod (2.13) 
Reference to (2.10) and (2.13) verifies the fundamental queueing formula 
L= AW. 

A queueing system alternates between durations when the servers are 
busy and durations when the system is empty and the servers are idle. An 
idle period begins the instant the last customer leaves, and endures until 
the arrival of the next customer. When the arrival process is Poisson of 
rate A, then an idle period is exponentially distributed with mean 


l 

Eli] X 
A busy period is an uninterrupted duration in which the system is not 
empty. When arrivals to a queue follow a Poisson process, then the suc- 
cessive durations X, from the commencement of the Ath busy period to the 
start of the next busy period form a renewal process (see Figure 2.1.) Each 
X, 18 composed of a busy portion B, and an idle portion /,. Then the re- 
newal theorem (see “A Queueing Model” in VII, Section 5.3) applies to 


X(t) 


<— B, — | I, f 


Figure 2.1 The busy periods B, and idle periods /, of a queueing system. 
When arrivals form a Poisson process, then X, = B, + i 
k= 1,2,..., are independent identically distributed nonnegative 
random variables, and thus form a renewal process. 
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tell us that p,(t), the probability that the system is empty at time ¢, con- 
verges to 


Efi, 


rae PAO 7 EL) + EIB 


We substitute the known quantities 7 = 1 — A/p and Ef[l,] = 1/A to 
obtain 


_ AL V/A 
mM IWA+ E(B) 
which gives 
l 
E[B,] = 
(Bl = 7 


for the mean length of a busy period. 

In Section 3, in studying the M/G/1 system we will reverse this reason- 
ing, calculate the mean busy period directly, and then use renewal theory 
to determine the server idle fraction 77. 


2.2 The M/M/~ System 


When an unlimited number of servers are always available, then all cus- 
tomers in the system at any instant are simultaneously being served. The 
departure rate of a single customer being p, the departure rate of k cus- 
tomers is ky, and we obtain the birth and death parameters 


A, =A and pw=kp fork =0,1,.... 
The auxiliary quantities of (2.6) are 


which sum to 
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whence 
l 
— oA 
Ga 
and 
NM k ~My 
m1, = 6m = Me™ tok =0,1,..., (2.14) 


k! 
a Poisson distribution with mean queue length 


A 
L=-. 


Since a customer in this system begins service immediately upon ar- 
rival, customer waiting time consists only of the exponentially distributed 
service time, and the mean waiting time is W = 1/w. Again, the basic 
queueing formula L = AW is verified. 

The M/G/ queue will be developed extensively in the next section. 


2.3. The M/M/s System 


When a fixed number s of servers are available and the assumption is 
made that a server is never idle if customers are waiting, then the appro- 
priate birth and death parameters are 


A, =A fork = 1,2,..., 


- |" fork = 0,1,...,5, 
Me lsu fork>s. 


If X(t) is the number of customers in the system at time ¢, then the num- 
ber undergoing service is min{X(f), s}, and the number waiting for service 
is max{X(t) — s, 0}. The system is depicted in Figure 2.2. 


554 IX Queueing Systems 


s = 5 parallel servers 


x 
A common 
waiting line x \. Departures 
Arrivals xXx “—_- x 


a 


x 


Figure 2.2 A queueing system with s servers. 


The auxiliary quantities are given by 


1/A\ 
Ku fork =0,1,...,5, 
g = Del ee SP 
k= en! $ Ks 
MM" Me —(=) (=) fork =s 
s!\pu/ \sp 


(2.15) 
_ L (A/ py 
7 2, J! (“J + s!(1 — A/sp) for A'S SH 


The traffic intensity in an M/M/s system is p = A/(sy). Again, as the 
traffic intensity approaches one, the mean queue length becomes un- 
bounded. When A < sy, then from (2.8) and (2.15), 


_ Sl AY (A/p) |" 
moe [> (2) r sil — A/sp)} ” 


jJ=0 J 
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and 
day 
kN pw To fork = 0,1,...,5, 
T= Var yysy a \es (2.16) 
ore —-) To fork = s, 
s!\ ps \sp 


We evaluate Ly, the mean number of customers in the system waiting for, 
and not undergoing, service. Then 


Lo= > (i - 5m = >) kts 
jHs k=0 


(2.17) 


rs 
To ( “ ) (A/S) 


s! (1 — A/sp)- 
Then 
L 
W, = >> 
W=W4+-, 
bh 
and 
A 
L=.w=a(w,+—)=1,+~ 
bh 
Exercises 


2.1. Customers arrive at a tool crib according to a Poisson process of 
rate A = 5 per hour. There is a single tool crib employee, and the individ- 
ual service times are exponentially distributed with a mean service time of 
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10 minutes. In the long run, what is the probability that two or more work- 
ers are at the tool crib being served or waiting to be served? 


2.2. Ona single graph, plot the server utilization 1 — a = p and the 
mean queue length L = p/(1 — p) for the M/M/1 queue as a function of 
the traffic intensity p = A/u forO < p< 1. 


2.3. Customers arrive at a checkout station in a market according to a 
Poisson process of rate A = 1 customer per minute. The checkout station 
- can be operated with or without a bagger. The checkout times for cus- 
tomers are exponentially distributed, and with a bagger the mean check- 
out time is 30 seconds, while without a bagger this mean time increases to 
50 seconds. Compare the mean queue lengths with and without a bagger. 


Problems 


2.1. Determine explicit expressions for 7 and L for the M/M/s queue 
when s = 2. Plot 1 — wm and L as a function of the traffic intensity 
p= A2p. 


2.2. Determine the mean waiting time W for an M/M/2 system when 
A = 2 and pw = 1.2. Compare this with the mean waiting time in an M/M/1 
system whose arrival rate is A = 1 and service rate is w = 1.2. Why is there 
a difference when the arrival rate per server is the same in both cases? 


2.3. Determine the stationary distribution for an M/M/2 system as a 
function of the traffic intensity p = A/2y, and verify that L = AW. 


2.4. The problem is to model a queueing system having finite capacity. 
We assume arrivals according to a Poisson process of rate A, independent 
exponentially distributed service times having mean 1/p, a single server, 
and a finite system capacity N. By this we mean that if an arriving cus- 
tomer finds that there are already N customers in the system, then that cus- 
tomer does not enter the system and is lost. 

Let X(t) be the number of customers in the system at time t. Suppose 
that VN = 3 (2 waiting, 1 being served). 


(a) Specify the birth and death parameters for X(t). 
(b) In the long run, what fraction of time is the system idle? 
(c) In the long run, what fraction of customers are lost? 
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2.5. Customers arrive at a service facility according to a Poisson 
process having rate A. There is a single server, whose service times are ex- 
ponentially distributed with parameter uw. Let M(t) be the number of peo- 
ple in the system at time tf. Then M(Z) is a birth and death process with 
parameters A, = A forn = O and w, = wforn = 1. Assume A < yw. Then 
m7, = (1 — Alp)(A/p)*, k = O, is a stationary distribution for N(4); cf. equa- 
tion (2.9). 

Suppose the process begins according to the stationary distribution. 
That is, suppose Pr{N(O) = k} = a, for k = 0, 1, .... Let D(®) be the 
number of people completing service up to time ¢. Show that D(t) has a 
Poisson distribution with mean At. 


Hint: Let P,(¢) = Pr{D(1) = j|N(O) = k} and P(t) = & 7,2, (0) = 
Pr{D(t) = j}. Use a first step analysis to show that P,(¢ + At) = 
MADP,,(t) + [1 — ACAD] P(t) + o( Ad), and for k = 1, 2,..., 


P(t + At) = w(ADP,-.j-.) + AADP..1 0 
+ [1 — (A + py(Ad]P,() + o(D). 


Then use P(t) = > 7,P,,(t) to establish a differential equation. Use the ex- 
plicit form of 77, given in the problem. 


2.6. Customers arrive at a service facility according to a Poisson 
process of rate A. There is a single server, whose service times are expo- 
nentially distributed with parameter yu. Suppose that “gridlock” occurs 
whenever the total number of customers in the system exceeds a capacity 
C. What is the smallest capacity C that will keep the probability of grid- 
lock, under the limiting distributing of queue length, below 0.001? Ex- 
press your answer in terms of the traffic intensity p = A/p. 


2.7. Let X(t) be the number of customers in an M/M/ queueing system 
at time ¢. Suppose that X(0) = 0. 


(a) Derive the forward equations that are appropriate for this process 
by substituting the birth and death parameters into VI, (3.8). 

(b) Show that M(t) = E[X(d)] satisfies the differential equation 
M'(t) = A — wM(d) by multiplying the jth forward equation by j 
and summing. 

(c) Solve for M(t). 
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We continue to assume that the arrivals follow a Poisson process of rate 
A. The successive customer service times Y,, Y,, .. . , however, are now al- 
lowed to follow an arbitrary distribution G(y) = Pr{Y, = y} having a fi- 
nite mean service time v = E[Y,]. The long run service rate is wp = 1/p. 
Deterministic service times of an equal fixed duration are an important 
special case. 


3.1. The M/G/1 System 


If arrivals to a queue follow a Poisson process, then the successive dura- 
tions X, from the commencement of the kth busy period to the start of the 
next busy period form a renewal process. (A busy period is an uninter- 
rupted duration when the queue is not empty. See Figure 2.1.) Each X, is 
composed of a busy portion B, and an idle portion /,. Then p,(t), the prob- 
ability that the system is empty at time ¢, converges to 


| Bt 
lim Pot) = 7% ELX] 
(3.1) 
E{f] 


~ EU,) + E(B] 


by the renewal theorem (see “A Queueing Model” in VII, Section 5.3). 

The idle time is the duration from the completion of a service that emp- 
ties the queue to the instant of the next arrival. Because of the memory- 
less property that characterizes the interarrival times in a Poisson process, 
each idle time is exponentially distributed with mean E[/,] = 1/A. 

The busy period is composed of the first service time Y,, plus busy pe- 
riods generated by all customers who arrive during this first service time. 
Let A denote this random number of new arrivals. We will evaluate the 
conditional mean busy period given that A = n and Y, = y. First, 


E[B,|A = 0, Y, = y] = y, 


because when no customers arrive, the busy period is composed of the 
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first customer’s service time alone. Next, consider the case in which 
A = 1, and let B’ be the duration from the beginning of this customer’s 
service to the next instant that the queue is empty. Then 


E[B,|A = 1, Y¥,=y] = y + EIB’ 
= y + E(B], 


because upon the completion of service for the initial customer, the single 
arrival begins a busy period B’ that is statistically identical to the first, so 
that E[B’] = E[B,]. Continuing in this manner we deduce that 


. E[B,|A =n, Y, = y] = y + nE[B\] 


and then, using the law of total probability, that 


E[B|Y, = y] = >. E[B,|A =n, Y, = y] Pr{A =alY, = 
n=0 


- -> (y + nE[B,\}— a 


= y+ AyE[B,]. 
Finally, 


EIB] = | E(B IY, = y) dG) 
0 


[ ( + AyEIB)} dO) 3.2) 
0 


v{1 + AE[B,]}. 


Since E[B,] appears on both sides of (3.2), we may solve to obtain 


E[B)] = = —;: provided that Av < 1. (3.3) 


To compute the long run fraction of idle time, we use (3.3) and 
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E(t] 
° — El,] + E(B] 
1/A 


~ WA + WL — Av) (3.4) 


1—dAv ifAv< 1. 


Note that (3.4) agrees, as it must, with the corresponding expression 
(2.8) obtained for the M/M/1 queue where v = 1/u. For example, if 
arrivals occur at the rate of A = 2 per hour and the mean service time is 
20 minutes, or v = } hours, then in the long run the server is idle 
1 — 2(4) = ; of the time. 


The Embedded Markov Chain 


The number X(t) of customers in the system at time ¢ is not a Markov 
process for a general M/G/1 system, because if one is to predict the future 
behavior of the system, one must know, in addition, the time expended in 
service for the customer currently in service. (It is the memoryless prop- 
erty of the exponential service time distribution that makes this additional 
information unnecessary in the M/M/1 case.) 

Let X,,, however, denote the number of customers in the system imme- 
diately after the departure of the nth customer. Then {X,} is a Markov 
chain. Indeed, we can write 


xX _ a 7 ] + A, if Xn-1 > 0, 
"LA, if X,_, = 0, (3.5) 
= (X,,-1 ~ 1)* + A,,, 


where A,, is the number of customers that arrive during the service of the 
nth customer and where x* = max{.x, 0}. Since the arrival process is Pois- 
son, the number of customers A, that arrive during the service of the nth 
customer is independent of earlier arrivals, and the Markov property fol- 
lows instantly. We calculate 


a, = Prd, = k} = Pr(A, = KY, = y} dGUy) 
(3.6) 


io <7 


-{ a ” 


0 


— >, 4G), 
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and then, forj = 0, 1,..., 


P; = Pr{X,, = J|\X,- — i} = Pr{A, =Jj (i ~ 1)*} 
- {tr foi=1,j2=i+1, (3.7) 
la, for i = 0. 


f] 


The Mean Queue Length in Equilibrium L 


The embedded Markov chain is of special interest in the M/G/1 queue be- 
cause in this particular instance, the stationary distribution {7;} for the 
Markov chain {X,,} equals the limiting distribution for the queue length 
process {X(t)}. That is, lim,_,.. Pr{X(¢) = j} = lim,,_,.. Pr{X, = j}. We will 
use this helpful fact to evaluate the mean queue length L. 

The equivalence between the stationary distribution for the Markov 
chain {X,,} and that for the non-Markov process {X(t)} is rather subtle. It 
is not the consequence of a general principle and should not be assumed 
to hold in other circumstances without careful justification. The equiva- 
lence in the case at hand is sketched in an appendix to this section. 

We will calculate the expected queue length in equilibrium 
L = lim,... E[X(@] by calculating the corresponding quantity in the em- 
bedded Markov chain, L = lim,,,.. E[X,]. If X = X,, is the number of cus- 
tomers in the system after a customer departs and X’ is the number after 
the next departure, then in accordance with (3.5), 


X’'=X-6+N, (3.8) 
where N is the number of arrivals during the service period and 


5-{' if X > 0, 
0 ifx=0. 


In equilibrium, X has the same distribution as does X’, and in particular, 
L= E[X] = E[X’], (3.9) 
and taking expectation in (3.8) gives 
E[X"] = E[X] — E[6] + EN], 
and, by (3.9) and (3.4), then 
E[N] = E[é] = 1 — m = Av. (3.10) 
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Squaring (3.8) gives 
(X'?? = X2 + & +N? — 28X + 2M(X — 8), 
and since & = 6 and X6 = X, then 
(X'? = X?4+ 56+ N? — 2X + 2MX — 8). (3.11) 


Now N, the number of customers that arrive during a Service period, is in- 
dependent of X, and hence of 6, so that 


E[N(X — 6)] = E[NJE[X — 4], (3.12) 
and because X and X’ have the same distribution, then 
E({(X')’] = E[X?]. (3.13) 
Taking expectations in (3.11) we deduce that 
E(\(X')] = E[X?] + E[6] + E[N?] — 2E[X] + 2E(NJE[X — 4], 
and then substituting from (3.10) and (3.13), we obtain 
O= Av + E[N’] — 2L + 2Avr{L — dvr}, 


or 


Av + E[N?] — 2(Avy 


L= — W-AYD (3.14) 


It remains to evaluate E[N’], where N is the number of arrivals during 
a service time Y. Conditioned on Y = y, the random variable N has a Pois- 
son distribution with a mean (and variance) equal to Ay [see (3.6)], 
whence E(NAY = y] = Ay + (Ay)’. Using the law of total probability then 
gives 


EIN’| = | EIN‘ = yl dGGy) 
0 


x 


=a y dG(y) + vf y’ dG(y) (3.15) 


0 


=Avt+N(r t+ rv’), 
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where 7 is the variance of the service time distribution G(y). Substituting 
(3.15) into (3.14) gives 
2Av + Mr? — (Av) 
1 — Av) 
(3.16) 
Vr t+ p 


= + , 
P21 = p) 


where p = Avis the traffic intensity. 
Finally, W = L/A, which simplifies to 


- 


_ Nr? + Vv) 
Wa vt oo (3.17) 


The results (3.16) and (3.17) express somewhat surprising facts. They 
say that for a given average arrival rate A and mean service time v, we can 
decrease the expected queue size L and waiting time W by decreasing the 
variance of service time. Clearly, the best possible case in this respect cor- 
responds to constant service times, for which 7? = 0. 


Appendix 


We sketch a proof of the equivalence between the limiting queue size dis- 
tribution and the limiting distribution for the embedded Markov chain in 
an M/G/1 model. First, beginning at t = 0 let 7, denote those instants when 
the queue size X(t) increases by one (an arrival), and let €, denote those in- 
stants when X(t) decreases by one (a departure). Let Y, = X(m, —) denote 
the queue length immediately prior to an arrival and let X, = X(&, +) de- 
note the queue length immediately after a departure. For any queue length 
i and any time t, the number of visits of Y, to i up to time t differs from 
the number of visits of X, to i by at most one unit. Therefore, in the long 
run the average visits per unit time of Y, to i must equal the average vis- 
its of X, to i, which is 77, the stationary distribution of the Markov chain 
{X,,}. Thus we need only show that the limiting distribution of {X(2)} is the 
same as that of {Y,}, which is X(t) just prior to an arrival. But because the 
arrivals are Poisson, and arrivals in disjoint time intervals are indepen- 
dent, it must be that X(t) is independent of an arrival that occurs at time t. 
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It follows that {X(‘)} and {Y,} have the same limiting distribution, and 
therefore {X(t)} and the embedded Markov chain {X,,} have the same lim- 
iting distribution. 


3.2. The M/G/x System 


Complete results are available when each customer begins service imme- 
diately upon arrival independently of other customers in the system. Such 
situations may arise when modeling customer self-service systems. Let 
W,, W,, ... be the successive arrival times of customers, and let V,, V,,... 
be the corresponding service times. In this notation, the Ath customer is in 
the system at time ¢ if and only if W, = ¢ (the customer arrived prior to f) 
and W, + V, > t (the service extends beyond 2). 

The sequence of pairs (W, V,), (W,, V2), ... forms a marked Poisson 
process (see V, Section 6.2), and we may use the corresponding theory to 
quickly obtain results in this model. Figure 3.1 illustrates the marked Pois- 


W, W, W, Ww 


| W, W; 


0 W, t 


Figure 3.1 For the M/G/co queue the number of customers in the system at 
time ¢ corresponds to the number of pairs (W,, V,) for which 
W, = tand W, + V, > ¢. In the sample illustrated here, the num- 
ber of customers in the system at time ¢ is 3. 
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son process. Then X(t), the number of customers in the system at time f, 
is also the number of points (W,, V,) for which W, = t and W, + V, > t. 
That is, it is the number of points (W,, V,) in the unbounded trapezoid de- 
scribed by 


A, = {(w, v):0 Swstandv>t-— wh}. (3.18) 


According to V, Theorem 6.1, the number of points in A, follows a Pois- 
son distribution with mean 


WA,) = | | A(dw) dG(v) 
Aj 


roe) 


r | acc} dw 


t{—-w 


(3.19) 
=afu — G(t — w)] dw 
0 


=| U1 - GQ] de. 
O 


In summary, 
Pt) = Prix) = k} 


_ pA, se #4, 


7 fork =0,1,..., 


where (A,) is given by (3.19). As t > ©, then 
lim p(A,) = r{ [1 — G(x)] dx = Ap, 
0 
where v is the mean service time. Thus we obtain the limiting distribution 
_ (Avye"*” 


T= a fork =0,1,.... 
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Exercises 


3.1. Suppose that the service distribution in a single server queue is ex- 
ponential with rate pw; 1.e., G(v) = 1 — e-“’ for v= 0. Substitute the mean 
and variance of this distribution into (3.16) and verify that the result 
agrees with that derived for the M/M/1 system in (2.10). 


3.2. Consider a single-server queueing system having Poisson arrivals 
at rate A. Suppose that the service times have the gamma density 


_ pery* teow 
(a) 
where a > 0 and yw > 0 are fixed parameters. The mean service time is 


a/w and the variance is a/u’. Determine the equilibrium mean queue 
length L. 


g(y) for y = 0, 


3.3. Customers arrive at a tool crib according to a Poisson process of 
rate A = 5 per hour. There is a single tool crib employee, and the individ- 
ual service times are random with a mean service time of 10 minutes and 
a standard deviation of 4 minutes. In the long run, what is the mean num- 
ber of workers at the tool crib either being served or waiting to be served? 


3.4. Customers arrive at a checkout station in a market according to a 
Poisson process of rate A = 1 customer per minute. The checkout station 
can be operated with or without a bagger. The checkout times for cus- 
tomers are random. With a bagger the mean checkout time is 30 seconds, 
while without a bagger this mean time increases to 50 seconds. In both 
cases, the standard deviation of service time is 10 seconds. Compare the 
mean queue lengths with and without a bagger. 


3.5. Let X(t) be the number of customers in an M/G/~ queueing system 
at time ¢. Suppose that X(0) = 0. Evaluate M(t) = E[X(t)], and show that 
it increases monotonically to its limiting value as t > ~, 


Problems 


3.1. Let X(t) be the number of customers in an M/G/co queueing system 
at time ¢, and let Y(t) be the number of customers who have entered the 
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system and completed service by time t. Determine the joint distribution 
of X(t) and Y(t). 


3.2. In operating a queueing system with Poisson arrivals at a rate of 
A = 1 per unit time and a single server, you have a choice of server mech- 
anisms. Method A has a mean service time of v = 0.5 and a variance in 
service time of rT? = 0.2, while Method B has a mean service time of 
y = 0.4 and a variance of 7? = 0.9. In terms of minimizing the waiting 
time of a typical customer, which method do you prefer? Would your an- 
swer change if the arrival rate were to increase significantly? 


4. Variations and Extensions 


In this section we consider a few variations on the simple queueing mod- 
els studied so far. These examples do not exhaust the possibilities but 
serve only to suggest the richness of the area. 

Throughout we restrict ourselves to Poisson arrivals and exponentially 
distributed service times. 


4.1. Systems with Balking 


Suppose that a customer who arrives when there are n customers in 
the systems enters with probability p, and departs with probability 
q, = 1 — p,. If long queues discourage customers, then p, would be a de- 
creasing function of n. As a special case, if there is a finite waiting room 
of capacity C, we might suppose that 


-{ forn <C, 
Pu \o forn=C, 


indicating that once the waiting room is filled, no more customers can 
enter the system. 

Let X(t) be the number of customers in the system at time ¢. If the ar- 
rival process is Poisson at rate A and a customer who arrives when there 
are n customers in the system enters with probability p,, then the appro- 
priate birth parameters are 


A, = Ap, forn =0,1,.... 
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In the case of a single server, then wu, = ww form = 1, 2,..., and we may 
evaluate the stationary distribution 7, of queue length by the usual means. 

In systems with balking, not all arriving customers enter the system, 
and some are lost. The input rate is the rate at which customers actually 
enter the system in the stationary state and is given by 


oa) 
1 = A > TT Pr 
n=0 


The rate at which customers are lost is A 2.» 77,g,, and the fraction of cus- 
tomers lost in the long run is 


x 


fraction lost = > Tn: 
n=0 


Let us examine in detail the case of an M/M/s system in which an ar- 
riving customer enters the system if and only if a server is free. Then 


w= { fork =0,1,...,5-— 1, 
‘10 fork = s, 


and 
by, = kp fork =0,1,...,5. 


To determine the limiting distribution, we have 
and then 
m= STA fork = 0,1,...,5. (4.1) 
je 


The long run fraction of customers lost is 77,g, = 7,, since g, = 1 in this 
case. 
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4.2. Variable Service Rates 


In a similar vein, one can consider a system whose service rate depends 
on the number of customers in the system. For example, a second server 
might be added to a single-server system whenever the queue length ex- 
ceeds a critical point €. If arrivals are Poisson and service rates are mem- 
oryless, then the appropriate birth and death parameters are 


wv fork = & 


A, =A fork =0,1,..., and m= 10, fork > €. 


More generally, let us consider Poisson arrivals A, = A fork = 0,1,..., 
and arbitrary service rates yu, fork = 1,2,.... The stationary distribution 
in this case is given by 


fork = 1, (4.2) 
where 


(4.3) 


4.3. A System with Feedback 


Consider a single-server system with Poisson arrivals and exponentially 
distributed service times, but suppose that some customers, upon leaving 
the server, return to the end of the queue for additional service. In partic- 
ular, suppose that a customer leaving the server departs from the system 
with probability g and returns to the queue for additional service with 
probability p = 1 — gq. Suppose that all such decisions are statistically in- 
dependent, and that a returning customer’s demands for service are statis- 
tically the same as those of a customer arriving from outside the system. 
Let the arrival rate be A and the service rate be pz. The queue system is de- 
picted in Figure 4.1. 
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Feedback 


Arrivals Departures 


Figure 4.1 A queue with feedback. 


Let X(t) denote the number of customers in the system at time ¢. Then 
X(t) is a birth and death process with parameters A, = A forn = 0,1,... 
and uw, = gu forn = 1,2,... .Itis easily deduced that the stationary dis- 
tribution in the case that A < gy is 


k 
ma(1-AA} feeeon. ae 
GM /\qGp 


4.4. A Two-Server Overflow Queue 


Consider a two-server system where server i has rate yu; for i = 1, 2. Ar- 
rivals to the system follow a Poisson process of rate A. A customer arriv- 
ing when the system is empty goes to the first server. A customer arriving 
when the first server is occupied goes to the second server. If both servers 
are occupied, the customer is lost. The flow is depicted in Figure 4.2. 


Poisson Arrivals Overflow Overflow lost 


Rate =v if #1 is busy if #1 and #2 busy 


Server #2 
Rate b> 


Server #1 
Rate p., 


Output 


Figure 4.2 A two-server overflow model. 
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The system state is described by the pair (X(t), Y(t)), where 


X(t) = if Server #1 is busy, 
0 if Server #1 is idle. 


and 


V(t) = if Server #2 is busy, 
0 if Server #2 is idle. 


The four states of the system are {(0, 0), (1, 0), (0, 1), (1, 1)}, and transi- 
tions among these states occur at the rate given in the following table: 


From To Transition 


State State Rate Description . 
(0,0) (1, 0) d Arrival when system is empty 

(1,0) (0,0) LL Service completion by #1 when #2 is free 
(1,0) (,1) nN Arrival when #1 is busy 

(1,1) (, 90) LL, Service completion by #2 when #1 is busy 
1,1) (©, 1) LL Service completion by #1 when #2 is busy 
(0,1) (1,1) nN Arrival when #2 is busy and #1 is free 
(0,1) (0,0) LL, Service completion by #2 when #1 is free 


The process (X(t), Y(t)) 1s a finite-state, continuous-time Markov chain 
(see VI, Section 6), and the transition rates in the table furnish the infini- 
tesimal matrix of the Markov chain: 


(0, 0) (O, 1) (1, 0) (1, 1) 
(0, 0) —X 0 m\ 0 
(0, 1) by —(A + pw) 0 A 
~ (1,0) |1 ow, 0 ~(A + p,) A 
(1, 1) 0 [hy [a —(L, + py) 


From VI, (6.11) and (6.12), we find the stationary distribution 7 = (710), 
To» MoM.) by solving tA = 0, or 
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—ATo0) + p2To.1) T MM.) = 0, 
(A+ py) Ton + 71) = 0, 
AT oo) | — CA + Bh) + po Ma = 9, 
AT.) + ATM 0 — Cy + He) Mar = 9, 
together with 
Too + Mor + Mo + May = 1. 


Tedious but elementary algebra yields the solution: 


My p(2A + py, + pL) 
70,0) = D ’ 


To.1) = 
(4.5) 


— Apa + pm, + ph) 
T1.0) = D —, 
(A + ww) 

D 9 


Waa = 


where 


D = pyp,(2A + py + My) + A? + AMAA + oy + My) 
+ AA + p). 


The fraction of customers that are lost, in the long run, is the same as the 
fraction of time that both servers are busy, 7, = (A + w,)/D. 


4.5. Preemptive Priority Queues 


Consider a single-server queueing process that has two classes of cus- 
tomers, priority and nonpriority, forming independent Poisson arrival 
processes of rates a and B, respectively. The customer service times are 
independent and exponentially distributed with parameters y and 6, re- 
spectively. Within classes there is a first come, first served discipline, and 
the service of priority customers is never interrupted. If a priority cus- 
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tomer arrives during the service of a nonpriority customer, then the latter’s 
service is immediately stopped in favor of the priority customer. The in- 
terrupted customer’s service is resumed when there are no priority cus- 
tomers present. 

Let us introduce some convenient notation. The system arrival rate is 
A = a+ B, of which the fraction p = a/A are priority customers and 
q = B/X are nonpriority customers. The system mean service time is given 
by the appropriately weighted means 1/y and 1/6 of the priority and non- 
priority customers, respectively, or 


J_ (4), f4\-4{% 4 

mM (-) + a(5) ite 6)’ (4.6) 
where jy is the system service rate. Finally, we introduce the traffic inten- 
sities p = A/p for the system, and 0 = a/y and t = £/6 for the priority and 
nonpriority customers, respectively. From (4.6) we see that p = o + 7. 

The state of the system is described by the pair (X(t), Y(#)), where X(2) 

is the number of priority customers in the system and Y(t) is the number 
of nonpriority customers. Observe that the priority customers view the 
system as simply an M/M/1 queue. Accordingly, we have the limiting dis- 
tribution from (2.9) to be 


lim Pr{X(t) = m} = (1 —- ao” form =0,1,... (4.7) 


provided 0 = a/y < 1. 
Reference to (2.10) and (2.13), gives us, respectively, the mean queue 
length for priority customers 


L,=——_ =— (4.8) 
y-a l-ga 


and the mean wait for priority customers 


l 
W, = | (4.9) 
y-a 


To obtain information about the nonpriority customers is not as easy, 
since these arrivals are strongly affected by the priority customers. Nev- 
ertheless, (X(t), Y(t)) is a discrete-state, continuous-time Markov chain, 


574 IX Queueing Systems 


and the techniques of VI, Section 6 enable us to describe the limiting dis- 
tribution, when it exists. The transition rates of the (X(t), Y(t)) Markov 
chain are described in the following table: 


From To Transition 
State State Rate Description 


(m, n) (m + 1, n) Q Arrival of priority customer 

-(m, n) (m,n + 1) B Arrival of nonpriority customer 
(0, n) (O,n — 1) 2) Completion of nonpriority service 
n=] 

(m, n) (m — 1,n) y Completion of priority service 
m= 1 


Let 
Ton = lim Pr{X(t) = m, Y(t) = n} 


be the limiting distribution of the process. Reasoning analogous to that of 
VI, (6.11) and (6.12) (where the theory was derived for a finite-state 
Markov chain) leads to the following equations for the stationary distrib- 
ution: 
(a + B)I%o = Yo + OTM), (4.10) 

(a + B + Y) Tino = VTin+1.0 + QT — 1.09 

m=1, (4.11) 
(a + B + 6) To,, = ALIN + OT .n+1 + Bton-15 | 

n=l, (4.12) 
(a + B + Y) Tan = VT intin + BTnn-\ + ATM 109 


mn2=l. (4.13) 


The transition rates leading to equation (4.13) are shown in Figure 4.3. 
In principle, these equations, augmented with the condition 
Xm Lin Tn» = 1, may be solved for the stationary distribution, when it ex- 
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Rate in Rate out 


Tint din (y) 


T iy, n-] (B) Tony (a + B + Y) 


T mo-lln (cx) 


Figure 4.3 In equilibrium, the rate of flow into any state must equal the rate 
of flow out. Illustrated here is the state (m, n) when m = 1 and 
n = 1, leading to equation (4.13). 


ists. We will content ourselves with determining the mean number L, of 
nonpriority customers in the system in steady state, given by 


Li = >, >. nt ny (4.14) 
m=0On=0 
We introduce the notation 
My, = >. Tin = >, NT ns (4.15) 
n=0 n=1 
so that 
L,=M,+M,+-°°°. (4.16) 
Using (4.7), let 


Py = Pr{X(t) = m} = 2, 7 ., = (1 - oo” (4.17) 


n=O 
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and 
m7, = Pr{¥(t) =n} = > Tay (4.18) 
m=O 


We begin by summing both sides of (4.10) and (4.11) form = 0,1,... 
to obtain 


(a + B)m + y»>. Tino = y>. Tino + Oy, + QTM, 
m=1 m=] 
which simplifies to give 
Ba, = 871). (4.19) 


Next, we sum (4.12) and (4.13) over m = 0, 1, .. . to obtain 
(a + B)7, + 57 n + y»>. Tinn = y>. Tin.n + 87.141 + Bo, + QT, 
m=] m=] 


which simplifies to 
Br, + Tn = BM, + OTM n+ 
and inductively with (4.19), we obtain 
Bo, = 877.141 forn =0,1,.... (4.20) 


Summing (4.20) over n = 0, 1,... and using 2 7, = 1, we get 


B=6 S Tons; = OPr{X(t) = 0, Y(t) > O}, 


n=O 
Or 


Pr{X(t) = 0, Y(t) > 0} = > m, = £ = 7, (4.21) 
n=] 


Since (4.17) asserts that Pr{X(t) = 0} = 1 — (a/y) = 1 — o, we have 
Too = Pr{X(t) = 0, Y(t) = 0} = Pr{X(t) = 0} — Pr{X(t) = 0, Y() > 0} 
a Bp 


rar an ar when ao + 7< 1. (4.22) 
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With these preliminary results in hand, we turn to determining 
M,, = 3, n7,,,. Multiplying (4.12) by n and summing, we derive 


(a + B + 5)M, = YM, + 56> AN on+1 + B>. NM n-1 


n=1 n=1 


= yM, + 6M, — 65> Tons: + BM, + p>. Ton-1 


n=0 n=1 


= yM, + 6M, — a( =) + BM, + BU — 9a), 


where the last line results from (4.17) and (4.21). After simplification and 
rearrangement, the result is 


M, = oM, + oo. (4.23) 


We next multiply (4.13) by n and sum to obtain 


(a + B + y)M,, = M,,,+\ + B>. NT ny n-1 + aM,,-\ 
n=] 


= YM,,., + BM,, + p> Tnn-| + @M,,-,. 
Again, referring to (4.17) and simplifying, we see that 
(a+ y)M,, = yM,,., + aM,,-, + BU - ao” 
form=1,2,.... (4.24) 
Equations (4.23) and (4.24) can be solved inductively to give 
B 


Mr, = Moo” + "mo" form=0,1,..., 


which we sum to obtain 


L,= > M,=—+—|m, +22 —| (4.25) 


m=0 l-—o 
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This determines L, in terms of M,. To obtain a second relation, we multi- 
ply (4.20) by n and sum to obtain 


BL, = 6 Ss NMon+\ = OM, — sy Mont 


n=0 n=0 


= am, - 3{&) [see (4.21) 


) 
or 
M, = Fu, + 1) = 7(L, + 1). (4.26) 
We substitute (4.26) into (4.25) and simplify, yielding 
] 
L, = rt, + += 2 } 
l-—o yl-o 
1 
(Egan alee Ata 
Il-—o l-—o yl-o 
and finally, 
) 
L, = (1 + () o | (4.27) 
l-o-T y/l-o 


The condition that L, be finite (and that a stationary distribution exist) is 
that 


p=otrT<l. 


That is, the system traffic intensity p must be less than one. 

Since the arrival rate for nonpriority customers is 8, we have that the 
mean waiting time for nonpriority customers is given by W, = L,/B. 

Some simple numerical studies of (4.8) and (4.27) yield surprising re- 
sults concerning adding priority to an existing system. Let us consider first 
a simple M/M/1 system with traffic intensity p whose mean queue length 
is given by (2.10) to be L = p/(1 — p). Let us propose modifying the sys- 
tem in such a way that a fraction p = ; of the customers have priority. We 
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assume that priority is independent of service time. These assumptions 
lead to the values a = B = 5A and y = 6 = p, whence o = t= p/2. Then 
the mean queue lengths for priority and nonpriority customers are given 
by 


and 


_ pl2 | pl2 |= p 
. +e (2 — p)(1 — p) 


The mean queue lengths L, L,, and L, were determined for several val- 
ues of the traffic intensity p. The results are listed in the following table: 


p L L, L, 
0.6 1.50 0.43 1.07 
0.8 4.00 0.67 3.34 
0.9 9.00 0.82 8.19 


0.95 19.00 0.90 18.10 


It is seen that the burden of increased queue length, as the traffic inten- 
sity increases, 1s carried almost exclusively by the nonpriority customers! 


Exercises 


4.1. Consider a two-server system in which an arriving customer enters 
the system if and only if a server is free. Suppose that customers arrive ac- 
cording to a Poisson process of rate A = 10 customers per hour, and that 
service times are exponentially distributed with a mean service time of six 
minutes. In the long run, what is the rate of customers served per hour? 


4.2. Customers arrive at a checkout station in a small grocery store ac- 
cording to a Poisson process of rate A = 1 customer per minute. The 
checkout station can be operated with or without a bagger. The checkout 
times for customers are exponentially distributed, and with a bagger the 
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mean checkout time is 30 seconds, while without a bagger this mean time 
increases to 50 seconds. Suppose the store’s policy is to have the bagger 
help whenever there are two or more customers in the checkout line. In 
the long run, what fraction of time is the bagger helping the cashier? 


4.3. Consider a two-server system in which an arriving customer enters 
the system if and only if a server is free. Suppose that customers arrive ac- 
cording to a Poisson process of rate A = 10 customers per hour, and that 
service times are exponentially distributed. The servers have different ex- 
"perience in the job, and the newer server has a mean service time of six 
minutes, while the older has a mean service time of four minutes. In the 
long run, what is the rate of customers served per hour? Be explicit about 
any additional assumptions that you make. 


4.4. Suppose that incoming calls to an office follow a Poisson process 
of rate A = 6 per hour. If the line is in use at the time of an incoming call, 
the secretary has a HOLD button that will enable a single additional caller 
to wait. Suppose that the lengths of conversations are exponentially dis- 
tributed with a mean length of 5 minutes, that incoming calls while a 
caller is on hold are lost, and that outgoing calls can be ignored. Apply the 
results of Section 4.1 to determine the fraction of calls that are lost. 


Problems 


4.1. Consider the two-server overflow queue of Section 4.4 and suppose 
the arrival rate is A = 10 per hour. The two servers have rates 6 and 4 per 
hour. Recommend which server should be placed first. That is, choose be- 
tween 

b, = 6, and Mb, = 4, 

pb, = 4 M, = 6, 


and justify your answer. Be explicit about your criterion. 


4.2. Consider the preemptive priority queue of Section 4.5 and suppose 
that the arrival rate is A = 4 per hour. Two classes of customers can be 
identified, having mean service times of 12 minutes and 8 minutes, and it 
is proposed to give one of these classes priority over the other. Recom- 
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mend which class should have priority. Be explicit about your criterion 
and justify your answer. Assume that the two classes appear in equal pro- 
portions and that all service times are exponentially distributed. 


4.3. Balking refers to the refusal of an arriving customer to enter the 
queue. Reneging refers to the departure of a customer in the queue before 
obtaining service. Consider an M/M/1 system with reneging such that the 
probability that a specified single customer in line will depart prior to ser- 
vice in a short time interval (¢, ¢ + At] is r,(At) + o(At) when n is the num- 
ber of customers in the system. (Note that r, = r, = 0.) Assume Poisson 
arrivals at rate A and exponential service times with parameter jz, and de- 
termine the stationary distribution when it exists. 


4.4. Asmall grocery store has a single checkout counter with a full-time 
cashier. Customers arrive at the checkout according to a Poisson process 
of rate A per hour. When there is only a single customer at the counter, the 
cashier works alone at a mean service rate of a per hour. Whenever there 
is more than one customer at the checkout, however, a “bagger” is added, 
increasing the service rate to 8 per hour. Assume that service times are ex- 
ponentially distributed and determine the stationary distribution of the 
queue length. 


4.5. A ticket office has two agents answering incoming phone calls. In 
addition, a third caller can be put on HOLD until one of the agents be- 
comes available. If all three phone lines (both agent lines plus the hold 
line) are busy, a potential caller gets a busy signal, and is assumed lost. 
Suppose that the calls and attempted calls occur according to a Poisson 
process of rate A, and that the length of a telephone conversation 1s expo- 
nentially distributed with parameter . Determine the stationary distribu- 
tion for the process. 


5. Open Acyclic Queueing Networks 


Queueing networks, composed of groups of service stations, with the de- 
partures of some stations forming the arrivals of others, arise in computer 
and information processing systems, manufacturing job shops, service in- 
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dustries such as hospitals and airport terminals, and in many other con- 
texts. A remarkable result often enables the steady-state behavior of these 
complex systems to be analyzed component by component. 


5.1. The Basic Theorem 


The result alluded to in the preceding paragraph asserts that the departures 
from a queue with Poisson arrivals and exponentially distributed service 
times in statistical equilibrium also form a Poisson process. We give the 
precise statement as Theorem 5.1. The proof is contained in an appendix 
at the end of this section. See also Problem 2.5. 


Theorem 5.1 Let {X(t), t = 0} be a birth and death process with con- 
stant birth parameters X, = A forn = 0, 1, ..., and arbitrary death 
parameters ju, for n = 1, 2, .... Suppose there exists a stationary 
distribution tm, = 0 where >, 7, = 1 and that Pr{X(0) = k} = 7, for 
k=0,1,.... Let D(t) denote the number of deaths in (0, t]. Then 


Pr{X(t) = k, D(t) = Jj} = Pr{X() = k} Pr{D(t) = 7} 


At Poot 
= ne fork,j = 9. 


j! 


Remark The stipulated conditions are satisfied, for example, when X(t) 
is the number of customers in an M/M/s queueing system that is in steady 
state wherein Pr{X(0) = j} = 7,, the stationary distribution of the process. 
In this case, a stationary distribution exists provided that A < sj, where 
jz is the individual service rate. | 

To see the major importance of this theorem, suppose that X(t) repre- 
sents the number of customers in some queueing system at time t. The the- 
Orem asserts that the departures form a Poisson process of rate A. Fur- 
thermore, the number D(?t) of departures up to time ¢ is independent of the 
number X(t) of customers remaining in the system at time ¢. 

We caution the reader that the foregoing analysis applies only if the 
processes are in statistical equilibrium where the stationary distribution 
7, = Pr{ X(t) = k} applies. In contrast, under the condition that X(0) = 0, 
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then neither will the departures form a Poisson process, nor will D(t) be 
independent of X(t). 


5.2. Two Queues in Tandem 


Let us use Theorem 5.1 to analyze a simple queueing network composed 
of two single-server queues connected in series as shown in Figure 5.1. 

Let X,(t) be the number of customers in the kth queue at time ¢. We as- 
sume Steady state. Beginning with the first server, the stationary distribu- 
tion (2.9) for a single server queue applies, and 


_ 


Pr{X,(t) = n} = (1 — “)() forn =0,1,.... 


Theorem 5.1 asserts that the departure process from the first server, de- 
noted by D,(t), is a Poisson process of rate A that is statistically indepen- 
dent of the first queue length X,(t). These departures form the arrivals to 
the second server, and therefore the second system has Poisson arrivals 
and is thus an M/M/1 queue as well. Thus, again using (2.9), 


precy =m) =(1- A)(A) forms 0,15. 


Server 1 Server 2 
Poisson Arrivals Rate Rate 
—— _____»- 
Rate v mM Me 


Figure 5.1 Two queues in series in which the departures from the first form 
the arrivals for the second. 
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Furthermore, because the departures D,(t) from the first server are inde- 
pendent of X,(z), it must be that X,(¢) is independent of X,(t). We thus ob- 
tain the joint distribution 


Pr{X,(t) = n and X,(t) = m} = Pr{X,\() = n} Pr{X,) = m} 


(TI) aa) 


We again caution the reader that the foregoing analysis applies only 
when the network is in its limiting distribution. In contrast, if both queues 
are empty at time ¢ = 0, then neither will the departures D,(¢) form a Pois- 
son process nor will D,(t) and X,(t) be independent. 


5.3. Open Acyclic Networks 


The preceding analysis of two queues in series applies to more general 
systems. An open queueing network (see Figure 5.2) has customers arriv- 
ing from and departing to the outside world. (The repairman model in VI, 
Section 4 is a prototypical closed queueing network.) Consider an open 
network having K service stations, and let X,(t) be the number of cus- 
tomers in queue k at time ¢t. Suppose 


1. The arrivals from outside the system to distinct servers form inde- 
pendent Poisson processes. 


Departures 


Figure 5.2 An open queueing network. 
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2. The departures from distinct servers independently travel instantly 
to other servers, or leave the system, with fixed probabilities. 

3. The service times for the various servers are memoryless in the sense 
that 


Pr{Server #k completes a service in (t, t + Ar}|X,(t) =n} 


(5.1) 
= p,,(At) + o(At) forn=1,2,..., 


and does not otherwise depend on the past. 

4. The system is in statistical equilibrium (steady state). 

5. The network 1s acyclic in that a customer can visit any particular 
server at most once. (The case where a customer can visit a server 
more than once is more subtle, and is treated in the next section.) 


Then 
(a) X(t), X,(t), ..., Xx(t) are independent processes, where 
Pr{X,(t) = n,, X(t) = nm, ..., Xx(t) = nx} 
= Pr{X,(t) = n,} Pr{X,(t) = n.} ++ Pr{X, (2) = nx}. 


(b) The departure process D,(t) associated with the kth server is a Pois- 
son process, and D,(t) and X,(t) are independent. 
(c) The arrivals to the kth station form a Poisson process of rate A,. 
(d) The departure rate at the kth server equals the rate of arrivals to that 
server. 
Let us add some notation so as to be able to express these results more 
explicitly. Let 


(5.2) 


Aq, = rate of arrivals to station k from outside the system, 
A, = rate of total arrivals to station k, 


P,; = probability that a customer leaving station k next visits station /. 


Then, the arrivals to station k come from outside the system or from some 
other station j. The departure rate from j equals the arrival rate to j, the 
fraction P, of which go to station k, whence 


Ay = Ace + >, A:Ph. (5.3) 
Jj 
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Since the network is acyclic, (5.3) may be solved recursively, beginning 
with stations having only outside arrivals. The simple example that fol- 
lows will make the procedure clear. 

The arrivals to station k form a Poisson process of rate A,. Let 


Xv} 
(n) = TM X —————__ forn=1,2,..., (5.4) 
Mii Bar °° * Bin 
where 
W(O) = Mo = 1 + > (__“t__| ; (5.5) 
n=1 My M2 mee Min 


Referring to (4.2) and (4.3) we see that (5.4) and (5.5) give the station- 
ary distribution for a queue having Poisson arrivals at rate A, and memo- 
ryless service times at rates y,, forn = 1, 2,.... Accordingly, we may 
now express (5.2) explicitly as 


Pr{X,(t) = n,, X,(t) = n,..., Xp = n,} 
= W(n,)YyQ(ny) ++ W(x). (5.6) 


Example Consider the three station network as shown in Figure 5.3. 


Poisson Arrivals 
Poisson, 


Rate = do, = 4 Rate (4) 4 


Poisson Departures 


Figure 5.3 A three station open acyclic network. Two servers, each of rate 3, 
at the first station give rise to the station rates ,, = 3 and y,, = 
6 for n = 2. Stations 2 and 3 each have a single server of rate 2 
and 6, respectively. 
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The first step in analyzing the example is to determine the arrival rates 
at the various stations. In equilibrium, the arrival rate at a station must 
equal its departure rate, as asserted in (d). Accordingly, departures from 
state 1 occur at rate A, = 4, and since these departures independently travel 
to stations 2 and 3 with respective probabilities P,, = ; and P,, = 3, we de- 
termine the arrival rate A, = (4)4. At station 3 the arrivals include both 
those from station 1 and those from station 2. Thus A, = (3)4 + G)4 = 4. 

Having determined the arrival rates at each station, we turn to deter- 
mining the equilibrium probabilities. Station 1 is an M/M/2 system with 
A = 4and pw = 3. From (2.16), or (5.4) and (5.5), we obtain 


4 4/3)? )7' 
Pr{X,(t) = 0} = am = f + (=) + sam = 0.2 


and 


(F)(02 forn = 1, 
Pr{X\(t) = n} = 0.4(2) for n = 2. 


Station 2 is an M/M/1 system with A = § and yx = 2. From (2.9) we obtain 


Pr{X,(t) =n} = (=\(5) forn=0,1,.... 


Similarly station 3 is an M/M/1 system with A = 4 and pu = 6, so that (2.9) 
yields 


Pr{X,(t) =n} = (5\(5) forn=0,1,.... 


Finally, according to Property (a), the queue lengths X,(#), X,(t), and 
X,(t) are independent, so that 


Pr{X,(t) = n,, X(t) = no, X;(t) = ns} 
= Pr{X\(t) = n,} Pr{X,(t) = ny} Pr{X;O = 73}. 
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5.4. Appendix: Time Reversibility 


Let {X(t), —© < t < +0} be an arbitrary countable-state Markov chain 
having a stationary distribution 7, = Pr{X(t) = j} for all states j and all 
times t. Note that the time index set is the whole real line. We view the 
process as having begun indefinitely far in the past, so that it now is 
evolving in a stationary manner. Let Y(t) = X(—12) be the same process, but 
with time reversed. The stationary process {X(t)} is said to be time re- 
versible if {X(t)} and {Y(t)} have the same probability laws. Clearly, 
Pr{X(0) = j} = Pr{ Y(O) = j} = 7, and it is not difficult to show that both 
processes are Markov. Hence, in order to show that they share the same 
probability laws it suffices to show that they have the same transition 
probabilities. Let 


P,(t) = Pr{X(0) = j|X() = i}, 
O,(t) = Pr{¥(0) = j|¥O) = i}. 
The process {X(t)} is reversible if 
P(t) = 0, (5.7) 
for all states 7, j and all times t. We evaluate Q,,(t) as follows: 
Q(t) = Pr{¥(t) = j¥) = i} 
= Pr{X(—1) = j|X(0) = i} 
= Pr{X(0) = j|X(t) = i} (by stationarity) 
_ Pr{X0) = j, XO = i} 
Pr{X(t) = i} 
TEM) 


Tr, 


{ 


In conjunction with (5.7) we see that the process {X(t} is reversible if 
7, F;(t) 
P(t) = Q,() = — ) 


or 
TPO) = EO, (5.8) 
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for all states i, j and all times ¢. 
As a last step, we determine a criterion for reversibility in terms of the 
infinitesimal parameters 


l 
a, = lim - Pr{X(t) = j|X(0) =i}, iF j. 


It is immediate that (5.8) holds when i = j. When i # j, 
P(t) = a,t + o(0), (5.9) 
which substituted into (5.8) gives 
Tla,t + o(t)] = mla,t + o(D], 
and after dividing by ¢ and letting t vanish, we obtain the criterion 


T,a;, = a for all i # j. (5.10) 


ei jMiji 

When the transition probabilities are determined by the infinitesimal pa- 

rameters, we deduce that the process {X(t)} is time reversible whenever 
(5.10) holds. 

All birth and death processes satisfying VI, (3.5) and having stationary 

distributions are time reversible! Because birth and death processes have 


Aiisy = Aj, 

Qii-1 = Mis 
and 

a,=0 iffi-j/>1, 
in verifying (5.10) it suffices to check that 
TWA i+) = Wi 1Qi+iir 
or 
TA; = Wis Mis) fori=0,1,.... (5.11) 

But [see VI, (4.6) and (4.7)], 


t 


(Ae AL, 
| —__——_+ 


T. = 7, fori=1,2,..., 
Mip2 °°? Bi 
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whence (5.11) becomes 


AoA, 7 Aj- _ AA, °° A; 
To A; = TM Mists 
MiP. ° °° Bb; Myf °° Mist 


which is immediately seen to be true. 


5.5. Proof of Theorem 5.1 


Let us consider a birth and death process { X(t)} having the constant birth 
rate A, = A fork = 0, 1, ..., and arbitrary death parameters pu, > 0 for 

= 1, 2,.... This process corresponds to a memoryless server queue 
having Poisson arrivals. A typical evolution is illustrated in Figure 5.4. 
The arrival process for {X(t)} is a Poisson process of rate A. The reversed 
time process Y(t) = X(—12) has the same probabilistic laws as does {X(}, 
so the arrival process for {Y(t)} also must be a Poisson process of rate A. 
But the arrival process for {Y(t)} is the departure process for {X(t)} (see 
Figure 5.4). Thus it must be that these departure instants also form a Pois- 
son process of rate A. In particular, if D(t) counts the departures in the X(-) 
process over the duration (0, ¢], then 


Pr(D() =f) = 


for j =0,1,.... (5.12) 


X(t) 


| | 
| | 

| 1 

! | 

fot 

| 

Arrivals for X(*) l 
| 


Departures for Y(¢) 


Departures for X(¢) | | Arrivals for ¥(*) 


Figure 5.4 A typical evolution of a queueing process. The instants of arrivals 
and departures have been isolated on two time axes below the graph. 
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Moreover, looking at the reversed process Y(—1t) = X(0), the “future” ar- 
rivals for Y(—f) in the Y duration [—?, 0) are independent of Y(—1) = X(x). 
(See Figure 5.4.) These future arrivals for Y(—f) are the departures for X(-) 
in the interval (0, t]. Therefore, these departures and X(t) = Y(—1?) must be 
independent. Since Pr{X(t) = k} = 7,, by the assumption of stationarity, 
the independence of D(t) and X(t) and (5.12) give 


Pr{X(t) = k, D®) = j} = Pr{X@ = k} Pr{ DO = 7} 


— ™me“(Aty 
— << 


9 


and the proof of Theorem 5.1 is complete. ( 


Exercises 


5.1. Consider the three server network pictured here: 


Server #2 
bo, = 3, n=] 


Departs with 
probability .8 


Arrivals Server #1 


. =6,n=21 
Poisson, rate Pin 


A=2 


Server #3 
3, = 2, nz il 


In the long run, what fraction of time is server #2 idle while, simultane- 
ously, server #3 is busy? Assume that all service times are exponentially 
distributed. 


5.2. Refer to the network of Exercise 5.1. Suppose that Server #2 and 
Server #3 share a common customer waiting area. If it is desired that the 
total number of customers being served and waiting to be served not ex- 
ceed the waiting area capacity more than 5% of the time in the long run, 
how large should this area be? 
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Problems 


5.1. Suppose three service stations are arranged in tandem so that the 
departures from one form the arrivals for the next. The arrivals to the first 
station are a Poisson process of rate A = 10 per hour. Each station has a 
single server, and the three service rates are 44, = 12 per hour, , = 20 per 
hour, and 4, = 15 per hour. In-process storage is being planned for sta- 
tion 3. What capacity C, must be provided if in the long run, the proba- 
bility of exceeding C, is to be less than or equal to 1 percent? That is, what 
is the smallest number C, = c for which lim,_,,. Pr{X,(t) > c} = 0.01? 


6. General Open Networks 


The preceding section covered certain memoryless queueing networks in 
which a customer could visit any particular server at most once. With this 
assumption, the departures from any service station formed a Poisson 
process that was independent of the number of customers at that station in 
steady state. As a consequence, the numbers X,(f), X,(t), ... , Xx(t) of cus- 
tomers at the K stations were independent random variables, and the prod- 
uct form solution expressed in (5.2) prevailed. 

The situation where a customer can visit a server more than once 1s 
more subtle. On the one hand, many flows in the network are no longer 
Poisson. On the other hand, rather surprisingly, the product form solution 
of (5.2) remains valid. 


Example To begin our explanation, let us first reexamine the simple 
feedback model of Section 4.3. The flow is depicted in Figure 6.1. The ar- 
rival process is Poisson, but the input to the server is not. (The distinction 
between the arrival and input processes is made in Figures 1.2 and 4.1.) 


Feedback 
Not Poisson 


Feedback with 
probability p 


Depart with probability q 


Arrivals 


Poisson Departures 
—_—_—_——_—_—_————>- 
Poisson Not Poisson 


Not Poisson 


Figure 6.1 A single server with feedback. 
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The output process, as shown in Figure 6.1, is not Poisson, nor is it inde- 
pendent of the number of customers in the system. Recall that each cus- 
tomer in the output is fed back with probability p and departs with prob- 
ability g = 1 — p. In view of this non-Poisson behavior, it is remarkable 
that the distribution of the number of customers in the system is the same 
as that in a Poisson M/M/1 system whose input rate is A/g and whose ser- 
vice rate is p1, as verified in (4.4). 


Example Let us verify the product form solution in a slightly more 
complex two server network, depicted in Figure 6.2. 


Server #2 
oP? 


Feedback with 
probability p 


Server #1 
ed 


Poisson, rate X 


Depart with 
probability g 


Figure 6.2 A two server feedback system. For example, server #2 in this sys- 
tem might be an inspector returning a fraction p of the output for 
rework. 


If we let X,(t) denote the number of customers at station i at time ¢, for 
i= 1,2, then X(t) = [X,(0), X,(2)] 1s a Markov chain whose transition rates 
are given in the following table: 


From To Transition 

State State Rate Description 

(m, n) (m + 1, n) r Arrival of new customer 
(m, n) (m+ 1,n— 1) [by Input of feedback customer 
n=l 

(m, n) (m — 1, n) Gi Departure of customer 
m= 1 

(mn) (m—I1,n+ 1) PL, Feedback to server #2 


m=1 
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Let 77,,,, = lim,_.. Pr{X,() = m, X,(t) = n} be the stationary distribu- 
tion of the process. Reasoning analogous to that of (6.11) and (6.12) of VI 
(where the theory was developed for finite-state Markov chains) leads to 
the following equations for the stationary distribution: 


AT9 = WT, (6.1) 
(A + 2) To.» — PRAM n-1 + Gh n» n= I, (6.2) 


(A + Hi) Tino = AT ~1.0 + Gy Tin +10 + M2Tin-\.19 
m=1, (6.3) 


(A + My, + by) Ton — ATi n + PPi Tint ian-1 + by Tintin 
+ MoT n-iantis mM, n = 1. (6.4) 


The mass balance interpretation as explained following (6.12) in VI 
may help motivate (6.1) through (6.4). For example, the left side in (6.1) 
measures the total rate of flow out of state (0, 0) and is jointly proportional 
tO 7 , the long run fraction of time the process is in state (0, 0), and A, 
the (conditional) transition rate out of (0, 0). Similarly, the right side of 
(6.1) measures the total rate of flow into state (0, 0). 

Using the product form solution in the acyclic case, we will “guess” a 
solution and then verify that our guess indeed satisfies (6.1) through (6.4). 
First we need to determine the input rate, call it A,, to server #1. In equi- 
librium, the output rate must equal the input rate, and of this output, the 
fraction p is returned to join the new arrivals after visiting server #2. We 
have 


Input Rate = New Arrivals + Feedback, 


which translates into 
A, = A + pA,, 


or 
A 
A= = (6.5) 
q 


The input rate to server #2 is 
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A; = pA, = —, (6.6) 


The solution that we guess is to treat server #1 and server #2 as inde- 
pendent M/M/1 systems having intput rates A, and A,, respectively (even 
though we know from our earlier discussion that the input to server #2, 
while of rate A,, is not Poisson). That is, we attempt a solution of the form 


mon = (1-22 )(4) (1-2) 6.7) 


7,\ 7,\ mn 7,\ Xr n 
-(1- AAJ PAYA orm ne 
Gh /\ Oo, GP2/\GPy 


It is immediate that 


fo) foe) 
» > Tina = I, 
m=0n=0 


provided that A, = (A/q) < p, and A, = pA/g < pm. 

We turn to verifying (6.1) through (6.4). Let 6,,, = (A/qu,)” X 
(pA/qp,)". It suffices to verify that 6,,, satisfies (6.1) through (6.4), since 
T,,, and @,,,, differ only by the constant multiple m,. = 1 — A,/p,) X 
(1 — A,/p,). Thus we proceed to substitute 0,,,, into (6.1) through (6.4) and 
verify that equality is obtained. 

We verify (6.1): 


We verify (6.2): 
d \" A A\"! A \f pa\" 
a+ walt) =m S gia) * (Gan gue) 
qh. Gi /\ py Gy /\ by 
or after dividing by (pA/qy.,)” and simplifying, 


At wa = (PO 2) 4 a= a+ we 
q/\ pa 
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We verify (6.3): 


Xr m r m-1 r m+] r m— | r 
ae AY AY" + anf APs w( AS (EY) 
qh qh qh Y Led Gh 


which, after dividing by (A/qy,)", becomes 


2+ n=) (a 


Or 
A+ phy = Qu, tA+ pp, = AF Mh. 


The final verification, that @,,,, satisfies (6.4), is left to the reader as Ex- 
ercise 6.1. 


6.1. The General Open Network 


Consider an open queueing network having K service stations, and let 
X,(t) denote the number of customers at station k at time t. We assume that 


1. The arrivals from outside the network to distinct servers form inde- 
pendent Poisson processes, where the outside arrivals to station k 
occur at rate A,,. 

2. The departures from distinct servers independently travel instantly 
to other servers, or leave the system, with fixed probabilities, where 
the probability that a departure from station j travels to station k is 
Pr 

3. The service times are memoryless, or Markov, in the sense that 


Pr{Server #k completes a service in (t, t + At]|X(t) =n} 


= p,(At)+o(At) forn=1,2,..., (6.8) 


and does not otherwise depend on the past. 
4. The system is in statistical equilibrium (stationary). 
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5. The system is completely open in that all customers in the system 
eventually leave. 


Let A, be the rate of input at station k. The input at station k is composed 
of customers entering from outside the system, at rate Ay,, plus customers 
traveling from (possibly) other stations. The input to station k from station 


j occurs at rate A,P,, whence, as in (5.3), 


K 
A= Ant > AP, fork=1,...,K. (6.9) 


Condition 5 above, that all entering customers eventually leave, ensures 
that (6.9) has a unique solution. 


With A,, ..., Ax given by (6.9), the main result is the product form 
solution 
— Pr{X,) =n, X,() = n,... , X(t) = ng} 
(6.10) 
= (n,) y(n.) --* Ung), 
where 
_ Toi _ 
b(n) = ———— forn =1,2,..., (6.11) 
Mii Mi2 °° * Min 
and 
x nV - 
W(0) = To = f +> —i__| (6.12) 
n=1 BMrnbe°°* Bin 


Example The example of Figure 6.1 (see also Section 4.3) corresponds 
to K = 1 (asingle service station) for which P,, = p < 1. The external ar- 
rivals are at rate A,, = A, and (6.9) becomes 


A, =A FAP, or A, = At AP, 


which gives A, = A/(1 — p) = A/q. Since the example concerns a single 
server, then 4, = m for all n, and (6.11) becomes 


wi)= nd) = nf) 


598 IX Queueing Systems 


where 


in agreement with (4.4). 
Example Consider next the two server example depicted in Figure 6.2. 
The data given there furnish the following information: 
Ayn =A, An = 9, 
P,, = 0, Pi. = Pp, 
P,, = 1, Py, = 0, 
which substituted into (6.9) gives 


A, =A+A,(1), 

A, = 0+ A,(—p), 
which readily yields 

A pa 


A, =— and A, = —, 
q q 


in agreement with (6.5) and (6.6). It is readily seen that the product solu- 
tion of (6.10) through (6.12) is identical with (6.7), which was directly 
verified as the solution in this example. 


Exercises 


6.1. In the case m = 1, n = 1, verify that 6,,, as given following (6.7) 
satisfies the equation for the stationary distribution (6.4). 


Problems 


6.1. Consider the three server network pictured here: 
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Server #2 
2, = 3, n=1 


Departs with 
probability .8 


Server #1 


Poisson, rate 
AX=2 


Server #3 
bu3, = 2, n=] 


In the long run, what fraction of the time is server #2 idle while, simulta- 
neously, server #3 is busy? Assume that the system satisfies assumptions 
(1) through (5) of a general open network. 
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Answers to Exercises 


Chapter | 


2.1. 


Because B and B* are disjoint events whose union is the whole sam- 


ple space, the law of total probability (Section 2.1) applies to give the de- 
sired formula. 


2.3. 


2.4. 


2.7. 


2.8. 


(b) 0 for x = 0; 
Sx) = 43x forO0<x< 1; 


0 forx = 1. 


(c) Var[Z 


eed 
I 
ale 


e 


(a) 0 for x < 0; 
F(x) = 4x* forOSx<=1; 
l for 1 <x. 


(b) E[X] = R/A1 + R). 
(c) Var[X] = R/[(R + 2)(R + 1)’). 


ftv) = AQ -— v4" for OSvsl; 
E[V] = 1A + 1); 
Var[V] = A/[(A + 2)(A + 1)’]. 
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2.9. 


3.1. 


3.2. 


3.3. 


3.4. 


3.5. 


3.6. 


4.1. 


4.2. 


4.3. 


4.4. 


Answers to Exercises 


0 for x < 0; 
5x? <x<1: 

F(x) =? | . forO=xs1; 
1 — (2 -— x) forl<x=2,; 
l for x > 2. 


E[X] = 1; Var[X] = 3. 
Pr{X = 3} = ¥%. 


Pr{0O defective} = 0.3151. 
Pr{O or 1 defective} = 0.9139. 


Pr{N = 10} = 0.0315. 


Pr{X = 2} = 2e°* = 0.2707. 
Pr{X = 2} = 5e* = 0.6767. 


Pr{X = 8} = 0.1334. 


(a) M n+l variance = ~—+ 
= ——; e= 
a) Mean 5 7) 
(b) ml form =0,...,N; 
n 
Pr{Z = m} = 7 
an+ilam form=nt+1,...,2n. 
n? 
1+2(n—k 
(c) PU =k) =O for k=0O,...,nN. 
Pr{X > 1.5} = e3 = 0.0498. 
Pr{X = 1.5} = 0. 


] l 
Median = — log 2; Mean = —. 
edian = > og ean = > 
Exponential distribution with parameter A/2.54. 


Mean = QO; Variance = 1. 


Answers to Exercises 


4.5. 


4.6. 


4.7. 


5.1. 


9.2. 


5.3. 


5.4. 


5.5. 


io) 
Oy: O7»O0y 
até = ss FD > for p# +1. 
Txt Oy POO; 


(a) f(y) =e" for y=O. 


1 | (n—1)én 
(b) fy(w) = -{—) for O<w<l. 
n\w 


R*nas the gamma density f,(r) = Are” = for r>0. 


= 0.6835938 
= 0.2617188 
= 0.0507812 
= 0.0039062. 


VV WV W 


PS ms PO OX 
hwWN 


(a) E[X,] = 2; E[Xs] = 33 
(b) E[min{X,, X,}] = 5; 

(c) Pr{X, <X bh = 3 

(d) E[X, — X,|X,<X,] = 4. 


(a) Pr{Naomi is last} = % 
(b) Pr{Naomi 1 1S last} = Be = 0.1128; 
(c) c=2+ V3. 


Chapter Il 


1.1. 


1.2. 


1.3. 


Pr{2 nickel heads|N =4)= 


Pr{X > 1X = 1} = 0.122184; 
Pr{X > 1)Ace of spades} = 0.433513. 
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606 Answers to Exercises 


1.4. Pr{X = 2} = 0.2204. 
e+ = 


en _ en . 


1.5. E[X|X is odd] = a( 


16. Pr{U=u,Z=z} = pl — pp», Osusz 


] 
Pr{U = ulZ =n} = ; OSucHn. 
| n+ 1 


2.1. Pr{Game ends in a 4} = j. 
2.3. Pr{Win} = 0.468984. 


3.1. k Pr{Z=k} E(Z) =i; 

0 0.16406 Var[Z] = 1.604167. 
] 0.31250 

2 0.25781 

3 0.16667 

4 0.07552 

5 0.02083 

6 


0.00260 


3.2. E[Z] = 3; Var[Z] = 3; 
Pr{Z = 2} = 0.29663. 


3.3. E[Z] = pw; Var[Z] = wl + por. 


3.4. Pr{X = 2} = 0.2204; 
E[X] = 2.92024. 


3.5. E[Z] = 6; Var[Z] = 26. 
4.14. P(X =2) =1. 
4.2. Pr{System operates} = 3. 


4.3. PrU>!)=1—-1(1 + log 2) = 0.1534. 


44. f(z) = for 0<z<~, 


] 
(1 + z) 
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45. fuluvy=e"" for u>0,v>0. 


5.1. x ; 1 2 
Pr{X > x} 0.61 0.37 0.14 
2 


1 
— E[X] 1 1 
X 


5.2. Pr(xX=1} = E[X] =p. 


Chapter Ill 

1.1. 0. 

1.2. 0.12, 0.12. 
1.3. 0.03. 

1.4. 0.02, 0.02. 
1.5. 0.025, 0.0075. 


2.1. (a) 0.47 0.13 0.40 
P?>= {042 0.14 0.44}. 
0.26 0.17 0.57 


(b) 0.13. 
(c) 0.16. 

2.2. n 0 1 2 3 4 
Pr{X¥,=0|X%,=0} 1 0 §$ | 3 


2.3. 0.264, 0.254. 
2.4. 0.35. 


2.5. 0.27, 0.27. 
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2.6. 0.42, 0.416. 
3-1. —] 0 ] 2 3 
-1]} 0 0 03 03 04 
oO] O 0 03 03 04 
P= i]/03 03 04 0 0 
211 0 03 03 04 O 
31 0 O 03 03 04 
a2. n=(Bo+ (at 
Pist = (= — “)ps 
N 
Pans = (— q. 
N 
3.3. -1 0 1 2 3 
-1} 0 0 01 04 05 
o| 0 0 01 04 05 
P= illor 04 05 0 0 
211 0 01 04 05 0 
31 0 O 01 04 05 
3.4. 2 -1 0 1 2 3 
~2 0 02 03 04 O1 


02 03 04 0.1 0 
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3.5. 


4.1. 


4.2. 


4.3. 


4.4. 


4.5. 


4.6. 


4.7. 


4.8. 


4.9. 


5.1. 


9.2. 


~ 

I 
Oo xe 
—=_=_ Oo —_ — 
Oop O&O N 


w,, = 1.290; w,, = 0.323 


v, = 1.613. 

Uy = = = 0.40909 ---; 
P® = 0.17; 
PX = 0.2658; 


P® = 0.35762 +++; 
P'S = 0.40245 «+ 


0.71273. 


(a) 0.8044; 0.99999928 --- 
(b) 0.3578; 0.00288 ---. 
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5.3. 


5.4. 


5.5. 


5.6. 


5.7. 


5.8. 


5.9. 


6.1. 


p? = | 0.58 0.42 | 
0.49 O05iT 
pi = | 0.526 0.474 
0.553 0.4471 
pi - | 0.5422 0.4578 | 
0.5341 0.4659 Ir 
ps 0.53734 0.46266 
0.53977 0.460231 
0 1 2 3 
Oy] 4 0 0 
pif: oF 0 
211} 0 0 3 
310 0 0 1 


P®. = 0820022583. 
2.73. 
Wy) = 0.3797468. 


Po = %% =1—-— a; 


p, = al — B), 4, = BU — a), 


r=aB+(1-—a) — Bp), for 121. 


Po = 1, d = 9, 
P=?P»9= 9,7 = 0 for i=l. 


(a) Ws = 3 


0 0-[-Y-(l 
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Answers to Exercises 


6.2. 


6.3. 


6.4. 


7.1. 


7.2. 


8.1. 


8.2. 


8.3. 


8.4. 


9.1. 


9.2. 


9.3. 


9.4. 


Uy = 0.65. 
vy = 2152.777 ---. 


v, = 2.1518987. 


Is =Is 


— 
— 
— 
— 


100 70 


79 79 


30 =: 100 
79 79 


(a) Uy = o3 
(b) w,, = 393 Wi = i: 


M(n) = 1, Vin) = n. 


w=b+20=b+t+ 4c — (b + 2cy. 


n 1 2 3 4 5 

u, 0.5 0.625 0.695 0.742 0.775 
1-2) 

M(n) = X", Vin) = w( —> ) A#1, 

n | 2 3 4 5 

u, 0.333 0.480 0.564 0.619 0.658 

u,, = 0.82387. 


G(s) = Po + ps 


Qs) = p + qs”. 
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Chapter IV 


11. m= 
— 3i — 16 — 19 
12. m= 7, =& Ih=& 


1.3. 7, = }%. 


— 10 — 5 — 5 — 9 
1.5. Ty = 29, TT, = 29. We = 49, 13 = 49- 


— 5 — 6 — 3 
1.6. My = Ta TW, = 14 Ty = Ty. 


— 140 — 40 — 135 — 126 
1.7. To = 444 7, = Gat, To = aaj. Ts = 44)- 


1 10. Mate = i 
2.1. 7 =3 
g 
2.2. One facility: Pr{Idle} = > 
1 + p* 
l 
Two facilities: Pr{Idle} = , 
l+pt+p 
2.3. (a) p 0 0.02 0.04 0.06 0.08 0.10 


AFI 0.10 0.11 0.12 0.13 0.14 0.16 
AOQ 0 0.018 0.036 0.054 0.072 0.090 


(b) p 0 0.02 0.04 0.06 0.08 0.10 
AFI 0.20 = 0.23 0.27 0.32 0.37 0.42 
AOQ 0 0.016 0.032 0.048 0.064 0.080 


2.4. p 0.05 0.10 0.15 0.20 0.25 
R, 0.998 0.990 0.978 0.962 0.941 
R, 0.998 0.991 0.981 0.968 0.952 
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2.5. m7, =i. 
2.6. m7 =}. 
2.7. (a) 0.6831; 
(b) m7, = 1 = 3, Ts = 5 
2.8. 7, = 4. 
3.1. {n=1;PH> 0} = {5, 8, 10, 13, 15, 16, 18, 20, 21, 23, 24, 25, 26, 
10) I Po?=0,P>0 for alli, j. 


3.2. Transient states: {0, 1, 3}. 
Recurrent states: {2, 4, 5}. 


3.3. (a) {0,2}, {1,3}, {4, 5}; 
(b) {0}, {5}, (1, 2}, (3, 4}. 


3.4. {0}, 


41. m=pliit+ptpt+pt+p*) for k=0,...,4. 


4.2. (a) ™ = ae 
(b) my = rece 


4.3. m= 7, = 0.2, 7, = 7, = 0.3. 


5.1. lim PY = lim PY} = 0.4; 
lim PS = lim p® = 0: 


lim PS? = 0.4. 

5.2. (a) 3, (e) 7, 
(b) 0, (f) X, 
(c) 4, (g) 3, 


(d) 5, (h) 3 
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Chapter V 


1.1. 


1.2. 


1.3. 


1.4. 


1.5. 


1.6. 


1.7. 


1.8. 


1.9. 


2.1. 


(a) e™; 
(b) e7?. 


(p,/py-\) = AMk, 


k 
Pr{X = k|N = n} =( 


(Atke~™ 


(a) a” 


(b) Pr{x(t) =n +k 


Answers to Exercises 


=0,1,... 


n 8 4 
"pr — py" p= 


k=0,1,...; 


_ t= shew? 


X(s) = n} 


ELX()X(s)] = Mts + As. k! 


Pr{X =k} =(1 — p)p* for k=0,1,...where p = 1/(1 + @). 


(a) e's 


(b) Exponential, parameter A = 3. 


(a) 2e7*; 
(b) Ye’; 
(c) (3)G)G)"s 


k 0 
(a) 0.290 
(b) 0.296 
(c) 0.301 


2 
0.370 0.225 
0.366 0.221 
0.361 0.217 
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2.2. Law of rare events, e.g. (a) Many potential customers who could 
enter storé, small probability for each to actually enter. 


2.3. The number of distinct pairs is large; the probability of any partic- 
ular pair being in sample is small. 


2.4. Pr{Three pages error free} ~ e7'”. 


3.1. e°. 

3.2. (a) e°—e”; 
(b) 4e7*. 

3.3. i. 


3.4. (3)G)°G) = 

38. ("f(r 2). mote 
m/\T T 

3.6. F(t) =(1 —e%. 


2 
/. + ~~, 
3 t N 


3.8. (5)@)*@)’. 


r—-l kao 
3.9. Priw.<ye1-y Wee 
; kzo— OK! 
41. —— 
4.2. |. 
43. 2. 


4.4. See equation (4.7). 


1 —-— -aS 
4.5. [1 - . | 


a 
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5.1. 


5.2. 


5.3. 


6.1. 


6.2. 


Answers to Exercises 


0.9380. 


0.05216. 


0.1548. 


0.0205. 


At 
Mean = 3 Variance = 


2 e 


e@ 7 AGC _ e”™ 


6.3. joe 
6.4. (a) i: 
(b) x. 
A(tye740 f 
6.5. Pr{M(t) =k} = a where A(ft)=A | [1 — G(w)] du. 
0 

Chapter VI 

1.1. PO) =e 
P\(t) = je! — 5; 

P(t) = 3[je! + je" — 3e™9 

P,(t) = 6[ge7' + je7* — je°7% — te7*]. 
1.2. (a) 4; 

(b) 2; 

(c) 5%. 

1.3. X(t) is a Markov process (memoryless property of exponential dis- 
tribution) for which the sojourn time in state k is exponentially 
distributed with parameter Ak. 

1.4. (a) Pr{X = 0} = (1 — ah)" = 1 — nah + oh); 


(b) Pr{X = 1} = nah(1 — ah)" = nah + o(h); 
(c) Pr{X = 2} = 1 — Pr{X = 0} — Pr{X — 1} = o(h). 
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1.5. E[X()) = ‘ Var[X(t)] = = p=e®. 
1.6. (a) Pd) =e; 

(b) P(t) = S[ge~* — 4e7*] 

(c) P(t) = 15[59e7* — ge7* + gee”). 


2.1. P,(t) = e*; 
P,(t) =e" — 3e°7; 
P,(t) = 10[e-* + 4e-* — 3e-*); 
P(t) = 1 — P(t) — Pt) — P(t). 


2.2. (a) 3 308 
(b) 1; 
(C) Spo: 


2.3. P,(t)=e"'; 
P,(t) = [e' — e-*); 
P\(t) = 2[ge' — e7™ + 3e7*); 
P(t) = 1— P(t) — P(t) — P,(d. 
2.4. P,(t) = 10e-“(1 — e-). 
3.1. A,=A,u,=ne for n=0,1,.... 


3.2. Assume that Pr{ Particular patient exits in [t, t + h)|k patients} 


l 
= —h + o(h). 
Mm, 
42. 7,= exe — p)* where p= 7 ; B 
I's -A 
43. 7,= “ where A= < 


k! 


B 
4.4. (a) m=1 / (1 + “ r (2) }: 


oy nri/(iede(4) 


45. m,=(k+ 10 — 07e&. 
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Qs -0 
4.6. 7, where A 
k! vy 
5.1. Use log- l+xt xi <1 
K 
5.2. Use -=1 for K> 1 
—1 
6 1 _ ( B, \ Bp 
a, + By/\a, + B, 
be A | 
2. = + a 
62. Foot rea A+ pe 
— - 24+V2 , 
7.1. f(t) — 2= V2 aay + ot V2 oven 
4 4 
V2 . V2 . 
7.2. fol) = yee + Wes 
2 — V2 > 5 2 + V2 eer 
fi) = “yee _ yee 
Chapter VII 
1.1. The age 6, of the item in service at time ¢ cannot be greater than t. 
1.2. F(t) =1-e"— Ate”. 
1.3. (a) True; 
(b) False; 
(c) False. 
1.4. (a) Whas a Poisson distribution with parameter Ak. 
f ek Ak n f -A(k+1) Ak + Av n 
(b) Pr{N(t) = = > —_— yy omy 
= 1=0 n! 
l l 
2.1. —+— 


Answers to Exercises 


a b 
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2.2. The system starts from an identical condition at those instants 
when firstjboth components are OFF. 


2.3. n M(n) u(n) 
1 0.4 0.4 
2 0.66 0.26 
3 1.104 0.444 
4 1.6276 0.5236 
5 2.0394 0.41184 
6 2.4417 0.4023 
7 2.8897 0.44798 
8 3.3374 0.44769 
9 3.7643 0.42693 

10 4.1947 0.4304 


l 
3.1. e*stt+—. 
Xv 


3.2. The inter-recording time is a random sum with a geometric number 
of exponential terms, and is exponential with parameter Ap. (See 
the example following II, (3.13)). 


3.3. Pr{M(t) =n, Ways, >t + 5} 
{re “as 
= ye 
n! 


41. M(t) ~3t— %. 


4.2. 3. 
V 273 — 15 
43. T= —— . 


4.4. c(T) is a decreasing function of T. 
45. h(x) =21-—x) for OSx=1. 


4.6. (a) et 
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a 
I : 
(0) - + B 
(c) ! 
at Bp 
51, —HA_ 
1+ pa 
5.2. ¢ 
5.3. T* =« 
6.1.6.2. n v,, 
0 0.6667 
i 1.1111 
2 0.96296 
3 1.01235 
4 0.99588 
5 1.00137 
6 0.99954 
7 1.00015 
8 0.99995 
9 1.00002 
10 0.99999 


6.3. E[X])=1;2), = 1. 


Chapter Vill 


1.1. (a) 0.8413. 
(b) 4.846. 


U,, 
1.33333 
0.88888 
1.03704 
0.98765 
1.00412 
0.99863 
1.00046 
0.99985 
1.00005 
0.99998 
1.00001 


1.2. Cov[W(s), W(t)] = min{s, t}. 


0 ; 
13. P= lyyry2 - 1); 


ot 
op 
Ox 


= —290). 


Answers to Exercises 


Answers to Exercises 621 


1.4. (a) 0. 
(b) 3u? + 3uv + uw. 


1.5. (a) ee"; 
(b) «(1 —s) for O0O<t<s<]l; 
(c) min{s, f}. 


1.6. (a) Normal, uw = 0, 0” = 4u + vy; 
(b) Normal, up = 0, 0? = 9u + 4v + w. 


5 


1.7. —. 
S 
2.1. (a) 0.6826. 
(b) 4.935. 
2.2. Iftan @= Vs/t, then cos @= VtK(s + 2). 
2.3. 0.90. 


2.4. Reflection principle: Pr{M, = a, S, <a} = 

Pr{M, = a, S, > a}(= Pr{S, > a}). 

Also, Pr{M, 2 a, S, = a} = Pr{S,, = a}. 
2.5. Pr{7 <t} = Pr{B(u) # 0 forallt=u<a}=1-— 0d, a). 
2.6. Pr{z, <t} = Pr{B(tu) = 0 for some u, b <u < ft}. 
3.1. Pr{R(t) < y|R(O) = x} 

= Pr{—y < B(t) < y/B(O) = x} 

= Pr{B(t) < y|B(O) = x} — Pr{B(t) < —y|B(O) = x}. 
3.2. 0.3174, 0.15735. 
3.3. 0.83995. 
3.4. 0.13534. 


3.5. No, No. 
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4.1. 
4.2. 
4.3. 
4.4. 
4.5. 


4.6. 


5.1. 
5.2. 


5.3. 


Answers to Exercises 
0.05. 
0.5125, 0.6225, 0.9933. 
0.25, 24.5, 986.6. 
t= 43.3 versus E[T] = 21.2 
0.3325. 
(a) EUE- ay] = | xe) de - a] ge) dx 
= ola) —a{l - (a) 


b —_— + 
(b) (X - b)* = o(é- H) | 
CO 
0.8643, 0.7389, 0.7357. 


0.03144, 0.4602, 0.4920. 


(a) E[V,] = (1 — B)"v 
Cov[V,, Vise) = (1 — BY. 

(b) E[AVV, = v] = —Bv 
Var[AVIV, = v] = 1. 


Chapter IX 


1.1. 


1.2. 


1.3. 


2.1. 


(a) Probability waiting planes exceed available air space. 
(b) Mean number of cars in lot. 


The standard deviation of the exponential distribution equals the 
mean. 


1} days. 


25 


36° 


Answers to Exercises 


2.3. L=1 versus L=5. 
31. vet retpa—e 
bo pe l—p 
p’(1 + a) 
3.2. L=p+——. 
ee 2a(l = p) 
3.3. 334. 


3.4. % versus 3. 
3.5. M(t) =r] [1 — GO] dy > av, 
O 


4.1. 8 per hour. 


42. & 
4.3. 8.55. 
4.4. 0.0647. 


174, _ 4 
9.1. 565) = 35. 


5.2. Pr{Z< 20} = 0.9520 
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index 


Absorbed Brownian motion 499-502, 507 
Absorbing Markov chain 123-125 
Absorbing state 105, 120, 170 

in birth and death process 379-384 
Absorption probability 165 
Absorption time 173 

in birth and death process 382-384 
Accounts receivable model 167, 230 
Addition law 6 
Age 45, 422, 424, 425, 426 

in a Poisson process 434 

in a renewal process 146 

limiting distribution of 443 
Age replacement 45, 221-223, 438, 446 
Airline reservation model 234 
Aperiodic 239, 246 
Arrival process 590 
Astronomy application 314 
Availability 217, 228, 234, 403 
Average fraction inspected 220, 228 
Average outgoing quality 220, 228 


Backward equations 359 

Bacteria 30, 289, 315 

Balking 567, 581 

Basic limit theorem of Markov chain 246 
Batch processing 150-151, 296 
Bernoulli distribution 24 

Bernoulli random variables sum of 280, 289 
Beta distribution 38, 83 

Beta function 55 

Bid model 69, 86, 149, 331 

Binary message 100, 104 


Binomial distribution 25, 293 
convolution of 32 
Binomial expansion 341 
Binomial theorem 55 
Bird problem 437 
Birth and death process 547 
limiting distribution of 366-368 
postulates for 355-356 
with absorbing states 379-384 
Birth process 333-339 
linear 346 
Black-Scholes 516-521 
Block replacement 428-430, 432, 456 
Borel set 19 
Branching process 91, 177-195 
multiple 192-195 
Brand switching model 229 
Breakdown rule 350, 354 
Brownian bridge 502, 522 
Brownian meander 504 
Brownian measure 534 
Brownian motion 474-479 
absorbed 499-502 
geometric 514-516 
integrated 535 
maximum of 491-493, 513 
of pollen grain 473, 526 
reflected 498-499 
standard 478 
two dimensional 506 
with drift 508-511 
Burn-in 403-405 
Busy period 551 
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Cable fatigue model 349-353, 354 
Cable strength model 483 
Call option 516, 523, 531 
Car problem 258 
Carrying capacity 386, 390 
Cash management 162-164, 488 
Central limit principle 483, 508, 533 
Central limit theorem 34, 75, 479 
Certain event 6 
Chapman—Kolmogorov equation 356, 394 
Chebyshev inequality 93 
Chemical reaction 354, 378 
Cigar problem 134-135 
Collards 365 
Communicate 235, 258 
Compound Poisson process 318-320 
Compression testing model 490 
Computer challenge 135, 168-169, 176, 
290, 311, 490 
Computer coding 288 
Conditional 
density function 71, 79 
dependence 85 
distribution function 71 
expectation 81 
expected value 60 
independence 85 
mass function 57 
probability 14 
Continuous random variable 8 
Continuous sampling 218-221 
Control limit rule 225 
Convolution 12 
of binomial distribution 32 
of geometric distribution 32 
of Poisson 268 
Correlation coefficient 12, 22, 39, 85 
Counter model 427, 436 
Covariance 11, 86 
matrix 481 
Covariance function 482 
for Brownian bridge 507 
for Brownian motion 479 
of Ornstein—Uhlenbeck process 525 


index 


Cox process 273-274, 408-411 
Crack failure model 64-68, 324, 343 
Craps 148 

Cumulative process 449, 451 
Current life, See age 

Customer waiting time 542 


Damage 134 
Death process 345-349 
linear 346-349 
Defects 274, 316 
Delayed renewal process 447 
Departure process 582, 590 
Detector model 310 
Dice 
five before seven 134 
four before seven 65, 68 
shaved 66-68 
weighted 69-70 
Differential equations 335, 337, 344, 415, 
523, 557 
and limiting distribution 367 
backward 359 
for Cox process 409-410 
for M(t) 361 
for two state Markov chain 363 
forward 361, 367 
Kolmogorov 359, 361 
Diffusion equation 475 
Discount factor 132, 300 
Discrete random variable 8 
Disease spread 99, 342 
Disjoint 6 
Dispatch model 278 
Dispersal of offspring 183 
Distribution function 7 
Do men have more sisters 64 
Doubly stochastic 206, 260 
Poisson process 273 


Ehrenfest urn model 108, 532-533, 539 
Electron multiplier model 78, 178 
Elementary renewal theorem 437 
Embedded Markov chain 396, 560, 563 


Index 


Embedded random walk 380 
Empirical distribution function 503 
Equal load sharing 351, 483 
Equally likely outcomes 3 
Excess life 115, 414, 422, 424, 425, 426, 
462 
for Poisson process 433 
limiting distribution of 443 
Expected value 9 
of sum 12 
Exponential distribution 35-37, 46-50 
Extinction 345, 347, 355, 388-392 
mean time to 385, 390-393 
Extinction probability 181-183, 384, 393 
and generating function 187-190 


Failure 446 
crack model for 324 
Failure model 115 
Failure rate 36, 41, 350 
Fair game 88, 143 
Family name survival 179 
Family size 184, 195, 197 
Fatigue 324, 343, 349-353 
Fecundity model 125-127 
Feedback 569, 592 
Fiber, See filament 
Fibrous composite 316 
Filament, strength of 326-328, 411-413 
First come-first served 542, 239 
First return 240, 245 
First return 
mean 255, 257 
First step analysis 101, 116-125, 169-174, 
255, 261-262, 360, 380 
infinitesimal 508, 509, 510 
Five before seven 134 
Flashlight problem 52-53 
Flea beetle model 365, 557 
Forward equations 361, 367, 376, 409 
Four before seven 65, 68 
Fundamental matrix 172, 175 


Gauge length 326 
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Gambler 90-91 
Gambler’s ruin 94, 142, 144, 148, 152, 
156, 509 
mean duration of 167 
Gambling 22, 88, 94 
Game show 150 
Gamma distribution 38, 276, 291 
Gamma function 53 
Gaussian process 481, 506 
Gene frequency model 108-111, 179, 
372-376, 528-529 
Generating function 185, 276 
and extinction probability 187-190 
and sums of rv’s. 185, 190-192 
Geometric Brownian motion 514-516 
Geometric distribution 25-26, 33, 339 
convolution of 32 
generalized 33 
Geometric series 56 


Hazard rate 36, 350 

History 215-216 

Hitting probability 173 

Hitting time 152, 493, 509 
mean of 165 

Home helicopter 342 


Idle period 551 
Immigration 342, 361, 368 
Impossible event 6 
Impulse response function 304 
Imputed volatility 520 
Independent events 7 
Independent increments 270, 477 
Independent Poisson processes 321 
Independent random variables 11 
defining a Markov chain 138-141 
Indicator 
function 45 
of event 24 
random variable 21 
Industrial mobility 398-402 
Infinitesimal first step analysis 508, 509, 
510 
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Infinitesimal generator 356, 396 
Infinitesimal matrix 359 
Infinitesimal parameter 339 
Infinitesimal probability 333, 334, 395 
Input process 541 
Input rate 570 
Integral equation 338 
Integrated Brownian motion 535 
Intersection 6 
Invariance principle 480, 482, 488 
Inventory model 106-108, 112-113, 114, 
115, 230 
Irreducible 246, 258 
Irreducible Markov chain 236 
Joint distribution 11 
L = AW 542-544 
Laplace transform 311, 414 
Last step analysis 360, 409 
Law of large numbers 4 
Law of rare events 27, 28, 279-286, 288 
Law of total probability 6-7, 14, 49, 58, 
60, 65, 71, 72, 82, 87, 94, 101, 119, 
120, 121, 268, 335, 409, 458, 461, 
559 
Lazy professor 457 
Learning model 297, 434 
Length biased sampling 436 
Limiting distribution 199, 254, 257, 377 
and differential equations 367 
for birth and death process 466—468 
interpretation of 207 
of age and excess life 443-445 
Limiting transition probability 212, 232 
Linear birth process, See Yule process 
Linear death process 346-349, 353, 355 
Linear growth 346 
with immigration 361, 368 
Loaded dice 69-70 
Logistic process 371-372 
Lognormal distribution 35 
Long run relative frequency 4 


M/G/1 queue 545, 558-563 
M/G/cc queue 564-565 
M/M/1 queue 544, 548-552 


Index 


M/M/S queue 553-555, 556 
M/M/ queue 544, 552-553, 557 
Marginal distribution 11 
Marked Poisson process 321-324, 564 
Markov chain 95 
embedded 396 
in continuous time 394—396 
two state 136-138, 200, 362-364, 
396-398, 407, 431 
Markov inequality 88-89, 93 
Markov process 95 
Marlene, see also Martha 470 
Martha, see also Marlene 151 
Martingale 87-94, 151, 184, 297, 490, 
507, 524 
systems theorems 88 
Match 21 
Maximal inequality 89-90 
Maximum of Brownian motion 491-493, 
513 
Maze 122-123, 131, 166 
Mean 9 
Measurable 17 
Median 9, 40 
Memoryless property 35, 48, 348, 585, 
596 
Mode 31 
Moment 9 
Multinomial distribution 29-30 


Naomi 51 
Negative binomial distribution 25-26, 59, 
84 

Neutron chain reaction 178 

Normal distribution 34 
bivariate 39 
conditional 86 
multivariate 40, 481 
notation for 475 
table of 486 

Null recurrent 246 


o(-) notation 9, 271-272, 276-277, 334, 
344 
Offspring 73 


Index 


Optimal replacement model 224-228 

Option pricing 516-521 

Ornstein—Uhlenbeck process 524-538 
and Brownian motion 525, 537 
stationary 532 

Overflow 545, 570, 580 


Partial sums 140-141, 176, 480 
Pattern search 114, 130 
Period of state 237 
Periodic 235, 254 
Peter principle 398-402, 450 
Planned replacement, See Age replace- 
ment 
Poisson 
decomposition 268-269 
sum of 268 
Poisson counting process 290 
Poisson distribution 26-29, 196, 258, 
267-268 
and uniform distribution 63 
convolution of 31 
Poisson point process 290 
definition 284 
nonhomogeneous 323 
postulates for 282 
Poisson process 270-273 
as a renewal process 427, 432-435 
compound 318-320 
doubly stochastic 273 
marked 321-324, 564 
mixed 273, 278 
nonhomogeneous 271-273 
postulates for 334, 344 
repositioning of 329 
spatial 311-314 
with Markov intensity 408-411 


Population size model 70, 335, 339-341, 


341, 346-349, 371-372, 384 
with age structure 463-469 
Population stability 193 
Position process 530, 536 
Positive recurrent 246 
Preemptive priority 572-579, 580 
Premium 517 
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Priority, preemptive 572-579, 580 

Priority queue 545 

Probability density function 8 

Probability mass function 8 

Probability measure 15 

Probability of absorption 165 

Probability of extinction 181-183 

Probability space 16 

Process control 453 

Product form 592 

Production process 30, 100, 104, 148, 213, 
218-221, 231 

Pure birth process, See birth process 

Pure death process, See death process 


Queue discipline 531 

Queueing model 70, 114, 450 
discrete 111-112, 149 

Queueing network 545 
general open 592-598 
open acyclic 581-591 

Quiz show 84 


Radioactive decay 294, 301, 364 
alpha particles 329 
Random environment 84, 70-77 
Random sum 180 
distribution of 74-75 
moments of 72 
Random variable 7 
continuous 8 
discrete 8 
Random walk 141-145, 151, 156-161, 
162, 480, 488, 489, 496 
simple 145, 177 
Random walk hypothesis 318 
Recurrent 246 
Recurrent state 240, 241 
Reducible Markov chain 258-264 
Redundancy 216-218, 403-405 
Reflected Brownian motion 498-499 
Reflection principle 491-493, 497 
Regular Markov chain 199 
Regular transition matrix 199-205, 257 
Reliability model 216-218, 228, 234 
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Reneging 581 
Renewal argument 458 
Renewal equation 458 
Renewal function 421 
for Poisson process 433 
Renewal process 146 
definition 419 
delayed 447-448 
in Markov chain 428 
in queues 427 
, stationary 448 
Renewal theorem 441, 462, 551, 558 
elementary 439 
Repair model 407 
Repairman model 369-371, 376, 377, 545, 
546 
Replacement model 150, 221-228, 438, 
450, 452 
optimal 224-228 
Reservoir model 496, 506 
Risk theory 70, 318, 452-453 
Rumor spread 134 


o-algebra 16 
Sample space 15 
Sampling 
continuous 218-221 
length biased 434, 436 
sum quota 305-308, 435 
wildlife 71 
Sequential decisions 511-513, 521, 522 
Sequential hypothesis test 513 
Server utilization 542 
Service distribution 531 
Shaved dice 66-68 
Shewart control chart 453 
Shock model 277, 278, 319, 328, 329 
Shot noise 304—305, 310, 311 
Sigma-algebra 16 
Size effect 325 
Social mobility 200, 210 
Sojourn 336, 357-359, 395, 447 
in Poisson process 290, 292 
Spatial point process 311-314 


Index 


Spread 
of disease 99, 342 
of rumor 134 
Stable age distribution 469 
State space 5 
Stationary 
Cox process 414 
distribution 247, 252, 256, 258, 366, 
376-378 
intensity 409 
Ornstein—Uhlenbeck process 532 
renewal process 448-449 
transition probability 96 
Sterile male insect control 388-392 
Stirling’s formula 54 
Stochastic 2 
Stochastic process 5 
Stock prices 75-77, 88, 318, 474, 489, 
506, 510, 514, 522, 531 
Stop and go traveler 366, 416-417 
Strength 
compression 490 
of cable 483 
of filaments 326-328 
Striking price 517 
Subjective probability 5 
Success runs 145-147, 165-166 
Successive maxima 138-140 
Sum quota sampling 305-308, 435 
Sums of numbers 56 
System 50-51, 231, 378, 379, 406, 407, 
431, 446, 456 
System throughput 542 


Tail probability 43-46, 52, 422 
Tandem queues 583 
Tensile strength 411-413 
Time reversal 588-591 
Total life 422, 457 
for Poisson process 434 
Tracking error 527 
Traffic flow 427 
Traffic intensity 549, 563, 573 
Transient state 120, 170, 235, 240, 241 


index 


Transition probability 95 
for two state Markov cirain 363-364 
limit of 212, 232 
matrix 96 
n-step 101 
regular 199-205 
Stationary 96 


Uncorrelated 11 
Uniform distribution 37, 93, 290, 311 
and Faisson process 297-299 
discrete 30 
Union 6 
Urn 63, 91-93, 108, 113, 115, 132, 134, 
168-169, 211 


Variable service rates 569 
Variance 9 

Velocity 538 

Volatility 520 
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Waiting time 
in Poisson process 290, 296, 309 
in renewal process 420 

Wald’s approximation 513 

Warrant 516 

Wear 134 

Weather model 215-216, 232, 238, 

455 

Weibull distribution 486 

Weighted dice 69-70 

Weiner process 474 

Wildlife sampling 71 


You Be the Judge 28-29 
Yule process 339-341, 342, 346 
with immigration 342 


Zero-seeking device 130, 175 
Zeros of Brownian motion 494496, 
497 


